Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7390

Let BKDWriter use temp heap for sorting points in proportion to IndexWriter's indexing buffer

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 6.2, 7.0
    • None
    • None
    • New

    Description

      With Lucene's default codec, when writing dimensional points, we only give BKDWriter 16 MB heap to use for sorting, regardless of how large IW's indexing buffer is. A custom codec can change this but that's a little steep.

      I've been testing indexing performance on a points-heavy dataset, 1.2 billion taxi rides from http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml , indexing with a 1 GB IW buffer, and the small 16 MB heap limit causes clear performance problems because flushing the large segments forces BKDwriter to switch to offline sorting which causes the DWPTs take too long to flush. They then fall behind, and Lucene does a hard stall on incoming indexing threads until they catch up.

      rcmuir had a simple idea to let IW pass the allowed temp heap usage to PointsWriter.writeField.

      Attachments

        1. LUCENE-7390.patch
          20 kB
          Michael McCandless
        2. LUCENE-7390.patch
          22 kB
          Michael McCandless

        Activity

          People

            Unassigned Unassigned
            mikemccand Michael McCandless
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: