Thanks, Lars; that was fast. I think this patch is going to be handy.
I'm wondering what people thought about an alternative approach to keeping stored fields from being too large, which would require mucking around with Lucene. In particular, the idea would be to allow field definitions like this:
<field name="body" type="text" indexed="true" stored="true"
Here we've made the normal Lucene maxFieldLength (i.e. # tokens to analyze) configurable a field-by-field basis. And in this declaration we've also made it so that what is stored is a function of what is analyzed. (Here if the first 2,000 tokens correspond to the first, say, 8,000 characters, then those 8,000 characters are what's going to be actually stored in the stored field.) This seems a little more natural than lopping off the text after a fixed number of characters.
If I could do the above, I'm thinking I would use that single field for both searching and highlighting. But if you wanted a separate field for highlighting (and were willing to have things run slower than with the current patch), then you could do this:
<field name="body" type="text" indexed="true" stored="false" omitNorms="false" />
<field name="highlighting" type="text" indexed="false" stored="true"
compressed="true" maxFieldLength="2000" storeOnlyAnalyzedText="true" />
<copyField src="body" dest="highlighting" />