Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7905

Optimizations for OrdinalMap

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 7.1
    • None
    • None
    • New

    Description

      OrdinalMap is a useful class to quickly map per-segment ordinals to global space, but it's fairly costly to build, which must typically be done on every NRT refresh.

      I'm using it quite heavily in two different places, one for SortedSetDocValuesFacetCounts, and another custom usage, and I found some small optimizations to improve its construction time.

      I switched it to use a simple priority queue to merge the terms instead of the more general MultiTermsEnum, which does extra work since it must also provide postings, implement seekExact, etc.

      I also pulled OrdinalMap out into its own oal.index class.

      When testing construction time for my case the patch is ~16% faster (159.9s -> 134.2s) in one case with 91.4 M terms and ~9% faster (115.6s -> 105.7s) in another case with 26.6 M terms.

      Attachments

        1. LUCENE-7905.patch
          67 kB
          Michael McCandless
        2. LUCENE-7905-specialized.patch
          70 kB
          Michael McCandless
        3. LUCENE-7905.patch
          67 kB
          Michael McCandless
        4. LUCENE-7905.patch
          67 kB
          Michael McCandless

        Activity

          People

            mikemccand Michael McCandless
            mikemccand Michael McCandless
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: