Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7390

Let BKDWriter use temp heap for sorting points in proportion to IndexWriter's indexing buffer

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.2, 7.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      With Lucene's default codec, when writing dimensional points, we only give BKDWriter 16 MB heap to use for sorting, regardless of how large IW's indexing buffer is. A custom codec can change this but that's a little steep.

      I've been testing indexing performance on a points-heavy dataset, 1.2 billion taxi rides from http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml , indexing with a 1 GB IW buffer, and the small 16 MB heap limit causes clear performance problems because flushing the large segments forces BKDwriter to switch to offline sorting which causes the DWPTs take too long to flush. They then fall behind, and Lucene does a hard stall on incoming indexing threads until they catch up.

      Robert Muir had a simple idea to let IW pass the allowed temp heap usage to PointsWriter.writeField.

        Attachments

        1. LUCENE-7390.patch
          22 kB
          Michael McCandless
        2. LUCENE-7390.patch
          20 kB
          Michael McCandless

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mikemccand Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: