Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
-
New
Description
I would like to start discussing removing the limit of ~2B documents that we have for indices, while still enforcing it at the segment level for practical reasons.
Postings, stored fields, and all other codec APIs would keep working on integers to represent doc ids. Only top-level doc ids and numbers of documents would need to move to a long. I say "only" because we now mostly consume indices per-segment, but there is still a number of places where we identify documents by their top-level doc ID like IndexReader#document, top-docs collectors, etc.