Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10315

Speed up BKD leaf block ids codec by a 512 ints ForUtil

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 9.2
    • None
    • None
    • New

    Description

      Elasticsearch (which based on lucene) can automatically infers types for users with its dynamic mapping feature. When users index some low cardinality fields, such as gender / age / status... they often use some numbers to represent the values, while ES will infer these fields as long, and ES uses BKD as the index of long fields. When the data volume grows, building the result set of low-cardinality fields will make the CPU usage and load very high.

      This is a flame graph we obtained from the production environment:
      addall.svg

      It can be seen that almost all CPU is used in addAll. When we reindex long to keyword, the cluster load and search latency are greatly reduced ( We spent weeks of time to reindex all indices... ). I know that ES recommended to use keyword for term/terms query and long for range query in the document, but there are always some users who didn't realize this and keep their habit of using sql database, or dynamic mapping automatically selects the type for them. All in all, users won't realize that there would be such a big difference in performance between long and keyword fields in low cardinality fields. So from my point of view it will make sense if we can make BKD works better for the low/medium cardinality fields.

      As far as i can see, for low cardinality fields, there are two advantages of keyword over long:

      1. ForUtil used in keyword postings is much more efficient than BKD's delta VInt, because its batch reading (readLongs) and SIMD decode.
      2. When the query term count is less than 16, TermsInSetQuery can lazily materialize of its result set, and when another small result clause intersects with this low cardinality condition, the low cardinality field can avoid reading all docIds into memory.

      This ISSUE is targeting to solve the first point. The basic idea is trying to use a 512 ints ForUtil for BKD ids codec. I benchmarked this optimization by mocking some random LongPoint and querying them with PointInSetQuery.

      Benchmark Result

      doc count field cardinality query point baseline QPS candidate QPS diff percentage
      100000000 32 1 51.44 148.26 188.22%
      100000000 32 2 26.8 101.88 280.15%
      100000000 32 4 14.04 53.52 281.20%
      100000000 32 8 7.04 28.54 305.40%
      100000000 32 16 3.54 14.61 312.71%
      100000000 128 1 110.56 350.26 216.81%
      100000000 128 8 16.6 89.81 441.02%
      100000000 128 16 8.45 48.07 468.88%
      100000000 128 32 4.2 25.35 503.57%
      100000000 128 64 2.13 13.02 511.27%
      100000000 1024 1 536.19 843.88 57.38%
      100000000 1024 8 109.71 251.89 129.60%
      100000000 1024 32 33.24 104.11 213.21%
      100000000 1024 128 8.87 30.47 243.52%
      100000000 1024 512 2.24 8.3 270.54%
      100000000 8192 1 3333.33 5000 50.00%
      100000000 8192 32 139.47 214.59 53.86%
      100000000 8192 128 54.59 109.23 100.09%
      100000000 8192 512 15.61 36.15 131.58%
      100000000 8192 2048 4.11 11.14 171.05%
      100000000 1048576 1 2597.4 3030.3 16.67%
      100000000 1048576 32 314.96 371.75 18.03%
      100000000 1048576 128 99.7 116.28 16.63%
      100000000 1048576 512 30.5 37.15 21.80%
      100000000 1048576 2048 10.38 12.3 18.50%
      100000000 8388608 1 2564.1 3174.6 23.81%
      100000000 8388608 32 196.27 238.95 21.75%
      100000000 8388608 128 55.36 68.03 22.89%
      100000000 8388608 512 15.58 19.24 23.49%
      100000000 8388608 2048 4.56 5.71 25.22%

      The indices size is reduced for low cardinality fields and flat for high cardinality fields.

      113M    index_100000000_doc_32_cardinality_baseline
      114M    index_100000000_doc_32_cardinality_candidate
      
      140M    index_100000000_doc_128_cardinality_baseline
      133M    index_100000000_doc_128_cardinality_candidate
      
      193M    index_100000000_doc_1024_cardinality_baseline
      174M    index_100000000_doc_1024_cardinality_candidate
      
      241M    index_100000000_doc_8192_cardinality_baseline
      233M    index_100000000_doc_8192_cardinality_candidate
      
      314M    index_100000000_doc_1048576_cardinality_baseline
      315M    index_100000000_doc_1048576_cardinality_candidate
      
      392M    index_100000000_doc_8388608_cardinality_baseline
      391M    index_100000000_doc_8388608_cardinality_candidate
      

      Attachments

        1. addall.svg
          634 kB
          Feng Guo
        2. cpu_profile_baseline.html
          131 kB
          Ignacio Vera
        3. cpu_profile_path.html
          97 kB
          Ignacio Vera

        Activity

          People

            gf2121 Feng Guo
            gf2121 Feng Guo
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 7h 10m
                7h 10m