Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8673

Use radix partitioning when merging dimensional points

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 8.x, 9.0
    • None
    • None
    • New

    Description

      Following the advise of jpountz in LUCENE-8623I have investigated using radix selection when merging segments instead of sorting the data at the beginning. The results are pretty promising when running Lucene geo benchmarks:

       

      Approach Index time (sec): Dev Index Time (sec): Base Index Time: Diff Force merge time (sec): Dev Force Merge time (sec): Base Force Merge Time: Diff Index size (GB): Dev Index size (GB): Base Index Size: Diff Reader heap (MB): Dev Reader heap (MB): Base Reader heap: Diff
      points 241.5s 235.0s 3% 157.2s 157.9s -0% 0.55 0.55 0% 1.57 1.57 0%
      shapes 416.1s 650.1s -36% 306.1s 603.2s -49% 1.29 1.29 0% 1.61 1.61 0%
      geo3d 261.0s 360.1s -28% 170.2s 279.9s -39% 0.75 0.75 0% 1.58 1.58 0%

       
      edited: table formatting to be a jira table
       

      In 2D the index throughput is more or less equal but for higher dimensions the impact is quite big. In all cases the merging process requires much less disk space, I am attaching plots showing the different behaviour and I am opening a pull request.

       

       

       

      Attachments

        1. Geo3D.png
          8 kB
          Ignacio Vera
        2. Geo3D.png
          8 kB
          Ignacio Vera
        3. Geo3D.png
          8 kB
          Ignacio Vera
        4. LatLonPoint.png
          7 kB
          Ignacio Vera
        5. LatLonPoint.png
          7 kB
          Ignacio Vera
        6. LatLonPoint.png
          7 kB
          Ignacio Vera
        7. LatLonShape.png
          8 kB
          Ignacio Vera
        8. LatLonShape.png
          8 kB
          Ignacio Vera
        9. LatLonShape.png
          8 kB
          Ignacio Vera

        Issue Links

          Activity

            People

              ivera Ignacio Vera
              ivera Ignacio Vera
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5h 40m
                  5h 40m