Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.9
    • Component/s: None
    • Labels:
      None

      Description

      Since creating the docMap is expensive, and it's only used during segment merging, not searching, defer creation until it is requested.

      SegmentMergeInfo is also used in MultiTermEnum, the term enumerator for a MultiReader. TermEnum is used by queries such as PrefixQuery, RangeQuery, WildcardQuery, as well as RangeFilter, DateFilter, and sorting the first time (filling the FieldCache).

      Performance Results:
      A simple single field index with 555,555 documents, and 1000 random deletions was queried 1000 times with a PrefixQuery matching a single document.

      Performance Before Patch:
      indexing time = 121,656 ms
      querying time = 58,812 ms

      Performance After Patch:
      indexing time = 121,000 ms
      querying time = 598 ms

      A 100 fold increase in query performance!

      All lucene unit tests pass.

      1. docMap.txt
        2 kB
        Yonik Seeley
      2. docMap.txt
        2 kB
        Yonik Seeley

        Activity

        Hide
        Yonik Seeley added a comment -

        attaching patch

        Show
        Yonik Seeley added a comment - attaching patch
        Hide
        Yonik Seeley added a comment -

        Also deferred creation of SegmentMergeInfo.postings (TermPositions) for another 15% gain.

        Same index and query were used to test, but this time 100,000 query iterations.

        defer docMap only:
        indexing time = 121,734 ms
        querying time = 18,266 ms

        defer docMap and postings:
        indexing time = 120,860 ms
        querying time = 15,625 ms

        Show
        Yonik Seeley added a comment - Also deferred creation of SegmentMergeInfo.postings (TermPositions) for another 15% gain. Same index and query were used to test, but this time 100,000 query iterations. defer docMap only: indexing time = 121,734 ms querying time = 18,266 ms defer docMap and postings: indexing time = 120,860 ms querying time = 15,625 ms

          People

          • Assignee:
            Yonik Seeley
            Reporter:
            Yonik Seeley
          • Votes:
            6 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development