Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8673

Use radix partitioning when merging dimensional points

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 8.x, master (9.0)
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Following the advise of Adrien Grand in LUCENE-8623I have investigated using radix selection when merging segments instead of sorting the data at the beginning. The results are pretty promising when running Lucene geo benchmarks:

       

      Approach Index time (sec): Dev Index Time (sec): Base Index Time: Diff Force merge time (sec): Dev Force Merge time (sec): Base Force Merge Time: Diff Index size (GB): Dev Index size (GB): Base Index Size: Diff Reader heap (MB): Dev Reader heap (MB): Base Reader heap: Diff
      points 241.5s 235.0s 3% 157.2s 157.9s -0% 0.55 0.55 0% 1.57 1.57 0%
      shapes 416.1s 650.1s -36% 306.1s 603.2s -49% 1.29 1.29 0% 1.61 1.61 0%
      geo3d 261.0s 360.1s -28% 170.2s 279.9s -39% 0.75 0.75 0% 1.58 1.58 0%

       
      edited: table formatting to be a jira table
       

      In 2D the index throughput is more or less equal but for higher dimensions the impact is quite big. In all cases the merging process requires much less disk space, I am attaching plots showing the different behaviour and I am opening a pull request.

       

       

       

        Attachments

        1. Geo3D.png
          8 kB
          Ignacio Vera
        2. LatLonPoint.png
          7 kB
          Ignacio Vera
        3. LatLonShape.png
          8 kB
          Ignacio Vera
        4. LatLonShape.png
          8 kB
          Ignacio Vera
        5. Geo3D.png
          8 kB
          Ignacio Vera
        6. LatLonPoint.png
          7 kB
          Ignacio Vera
        7. LatLonPoint.png
          7 kB
          Ignacio Vera
        8. Geo3D.png
          8 kB
          Ignacio Vera
        9. LatLonShape.png
          8 kB
          Ignacio Vera

          Issue Links

            Activity

              People

              • Assignee:
                ivera Ignacio Vera
                Reporter:
                ivera Ignacio Vera
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5h 40m
                  5h 40m