I'm worried about a few things:
I think the limit is ok, because in my eyes its the limit of a single term. I feel that anyone arguing for increasing the limit only has abuse cases (not use cases) in mind. I'm worried about making dv more complicated for no good reason.
I guess I see DV binary as more like a stored field, just stored
column stride for faster access. Faceting (and I guess spatial)
encode many things inside one DV binary field.
I'm worried about opening up the possibility of bugs and index corruption (e.g. clearly MULTIPLE people on this issue dont understand why you cannot just remove IndexWriter's limit without causing corruption).
I agree this is a concern and we need to take it slow, add good
I'm really worried about the precedent: once these abuse-case-fans have their way and increase this limit, they will next argue that we should do the same for SORTED, maybe SORTED_SET, maybe even inverted terms. They will make arguments that its the same as binary, just with sorting, and why should sorting bring in additional limits. I can easily see this all spinning out of control.
I think that most people hitting the limit are abusing docvalues as stored fields, so the limit is providing a really useful thing today actually, and telling them they are doing something wrong.
I don't think we should change the limit for sorted/set nor terms: I
think we should raise the limit ONLY for BINARY, and declare that DV
BINARY is for these "abuse" cases. So if you really really want
sorted set with a higher limit then you will have to encode yourself
into DV BINARY.
The only argument i have for removing the limit is that by expanding BINARY's possible abuse cases (in my opinion, thats pretty much all its useful for), we might prevent additional complexity from being added elsewhere to DV in the long-term.