Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-11240

Raise UnInvertedField internal limit

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 5.5.4, 6.6
    • 7.1, 8.0
    • faceting

    Description

      UnInvertedField has via DocTermOrds an internal limitation of 2^24 bytes for byte-arrays holding term ordinals. For String faceting on high-cardinality Text fields, this can trigger the exception with "Too many values for UnInvertedField". A search for that phrase shows that the exception is encountered in the wild.

      The limitation is due to the packing being a combination of values and pointers: If the values (term ordinals) for a given document-ID can fit in an integer, they are stored directly. If the value of the first 8 bits in the integer is 1, it signals that the following 3 bytes (24 bits) is a pointer into a byte-array, limiting the array-size to 16M (2^24).

      Solution: Due to the values being packed at vInts, bit 31 (the last bit) of the integer will never be 1 if the integer contains values. This means that this bit it can be used for signalling whether or not the preceding bits should be parsed as values or a pointer. The effective pointer size is thus 2^31, which matches the array-length limit in Java. Changing the signalling mechanism does not affect space requirements and should not affect performance.

      Note that this is only a 100-fold increase ever the 2^24 limit, not an elimination: Performing uninverted Text field faceting on 100M documents with 5K terms each will still raise an exception.

      Attachments

        1. SOLR-11240.patch
          18 kB
          Toke Eskildsen
        2. SOLR-11240.patch
          17 kB
          Toke Eskildsen
        3. SOLR-11240.patch
          17 kB
          Toke Eskildsen

        Activity

          People

            toke Toke Eskildsen
            toke Toke Eskildsen
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: