Use a small vector optimization for the aggregate stack.
Some time ago I had tried allocating the full size up front, which caused a performance decrease. Hopefully this hybrid is optimal.