Number of elements on the stack shouldn't affect the calculation
of ElemWidth(). Variable 'start' needs to be subtracted from the
loop variable 'i' to make indexing zero-based.
There is an additional unit test to pack nested vectors. Size of
the packed buffer *without* this fix is 798 and only 664 bytes
*with* the fix.