Lucene - Core
  1. Lucene - Core
  2. LUCENE-5841

Remove FST.Builder.FreezeTail interface

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10, Trunk
    • Component/s: core/codecs
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The FST Builder has a crazy-hairy interface called FreezeTail, which is only
      used by BlockTreeTermsWriter to find appropriate prefixes
      (i.e. containing enough terms or sub-blocks) to write term blocks.

      But this is really a silly abuse ... it's cleaner and likely
      faster/less GC for BTTW to compute this itself just by tracking the
      term ordinal where each prefix started in the pending terms/blocks. The
      code is also insanely hairy, and this is at least a baby step to try
      to make it a bit simpler.

      This also makes it very hard to experiment with different formats at
      write-time because you have to get your new formats working through
      this strange FreezeTail.

      1. LUCENE-5841.patch
        139 kB
        Michael McCandless
      2. LUCENE-5841.patch
        135 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Patch, fixing BTTW (and its forks) to do their own term -> block
        assignment w/o abusing FST.Builder, and then entirely removing the
        FreezeTail API from FST.Builder.

        Show
        Michael McCandless added a comment - Patch, fixing BTTW (and its forks) to do their own term -> block assignment w/o abusing FST.Builder, and then entirely removing the FreezeTail API from FST.Builder.
        Hide
        Han Jiang added a comment -

        It is really great to see this interface removed!

        Show
        Han Jiang added a comment - It is really great to see this interface removed!
        Hide
        Michael McCandless added a comment -

        New patch, I just changed PendingTerm class to use byte[] not BytesRef to hold the term to save some silly garbage. I think it's ready.

        Also I ran a "merge intensive" perf test from Rob, first building a geonames index with lots of segments (using NoMergePolicy), and then using SerialMergeScheduler measuring how long forceMerge(1) takes, and the patch makes this a bit faster: from ~95 seconds for trunk to ~87 seconds with this change, or ~8% faster.

        Show
        Michael McCandless added a comment - New patch, I just changed PendingTerm class to use byte[] not BytesRef to hold the term to save some silly garbage. I think it's ready. Also I ran a "merge intensive" perf test from Rob, first building a geonames index with lots of segments (using NoMergePolicy), and then using SerialMergeScheduler measuring how long forceMerge(1) takes, and the patch makes this a bit faster: from ~95 seconds for trunk to ~87 seconds with this change, or ~8% faster.
        Hide
        Robert Muir added a comment -

        Nice results. I see this tail-freezing as a hotspot frequently.

        Show
        Robert Muir added a comment - Nice results. I see this tail-freezing as a hotspot frequently.
        Hide
        ASF subversion and git services added a comment -

        Commit 1613161 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1613161 ]

        LUCENE-5841: simplify how block tree terms dict assigns terms to blocks

        Show
        ASF subversion and git services added a comment - Commit 1613161 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1613161 ] LUCENE-5841 : simplify how block tree terms dict assigns terms to blocks
        Hide
        ASF subversion and git services added a comment -

        Commit 1613235 from Michael McCandless in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1613235 ]

        LUCENE-5841: simplify how block tree terms dict assigns terms to blocks

        Show
        ASF subversion and git services added a comment - Commit 1613235 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1613235 ] LUCENE-5841 : simplify how block tree terms dict assigns terms to blocks
        Hide
        ASF subversion and git services added a comment -

        Commit 1616448 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1616448 ]

        LUCENE-5841: make sure final term blocks are the right size

        Show
        ASF subversion and git services added a comment - Commit 1616448 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1616448 ] LUCENE-5841 : make sure final term blocks are the right size
        Hide
        ASF subversion and git services added a comment -

        Commit 1616450 from Michael McCandless in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1616450 ]

        LUCENE-5841: make sure final term blocks are the right size

        Show
        ASF subversion and git services added a comment - Commit 1616450 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1616450 ] LUCENE-5841 : make sure final term blocks are the right size

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development