If we start guaranteeing that fields get returned in the same order as they were added, what are the costs?
I'm not yet sure, but I expect it to be a minor added cost; I'll know more as I dig in.
AFAIK, sorting the fields is necessary to group multiple values for the same field, and it also ensured that segments with the same fields had the same field numbers, which enables faster segment merging?
Actually the mapping of field name -> number happens before the sort, so presently we rely on the docs having the same order of fields, to enable bulk merging of stored fields & term vectors. Bulk merging is really a rather brittle optimization. Actually we could improve it by only checking for matched name -> numbers for fields that are stored or have term vectors enabled (right now we check that all fields match), and by pre-sorting the field names when doing the mapping to number.
I plan to just move the stored fields writer up in the indexing chain, so that it receives the in-order list of fields, not the coalesced & sorted list.
Based on McCandless comments in email, it sounds like order was only ever maintained for fields that don't use term vectors - in which case the documentation was only ever partially correct.
Actually, order was correctly maintained prior to 2.3. In 2.3, it was maintained only if you had no term vectors fields (ie, we only sorted when there was at least 1 field w/ term vectors enabled). In 2.4 we always sort and order was never maintained. For 2.9 I think we should fix it again so that order is fully maintained.
For example, one can think of a simple way to improve the performance of loading only certain fields
I think that'd be a good improvement to how fields are stored!