Lucene - Core
  1. Lucene - Core
  2. LUCENE-4677

Use vInt to encode node addresses inside FST

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Today we use int, but towards enabling > 2.1G sized FSTs, I'd like to make this vInt instead.

      1. LUCENE-4677.patch
        9 kB
        Michael McCandless
      2. LUCENE-4677.patch
        7 kB
        Michael McCandless
      3. LUCENE-4677.patch
        3 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          Michael McCandless added a comment -

          Initial patch ... not committable until I add a back-compat layer somehow ... (how come TestBackCompat isn't failing...).

          I tested Kuromoji's TokenInfo FST, temporarily turning off packing: vInt encoding made the non-packed FST ~12% smaller (good!). The packed FST is unchanged in size.

          Then I tested on a bigger FST (AnalyzingSuggester build of FreeDB's song titles) and the resulting FST is nearly the same size (1.0463 GB for trunk and 1.0458 with patch).

          Show
          Michael McCandless added a comment - Initial patch ... not committable until I add a back-compat layer somehow ... (how come TestBackCompat isn't failing...). I tested Kuromoji's TokenInfo FST, temporarily turning off packing: vInt encoding made the non-packed FST ~12% smaller (good!). The packed FST is unchanged in size. Then I tested on a bigger FST (AnalyzingSuggester build of FreeDB's song titles) and the resulting FST is nearly the same size (1.0463 GB for trunk and 1.0458 with patch).
          Hide
          Michael McCandless added a comment -

          New patch, with fixes to TestBackwardsCompatibility to fail w/ the current patch. I gen'd the test index using 40x ...

          Next step is to add back compat to FST.

          Show
          Michael McCandless added a comment - New patch, with fixes to TestBackwardsCompatibility to fail w/ the current patch. I gen'd the test index using 40x ... Next step is to add back compat to FST.
          Hide
          Michael McCandless added a comment -

          New patch, w/ back compat layer in place (and TestBackCompat now passes). I think it's ready.

          Show
          Michael McCandless added a comment - New patch, w/ back compat layer in place (and TestBackCompat now passes). I think it's ready.
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Michael McCandless
          http://svn.apache.org/viewvc?view=revision&revision=1432466

          LUCENE-4677: use vInt not int to encode arc's target address in un-packed FSTs

          Show
          Commit Tag Bot added a comment - [trunk commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1432466 LUCENE-4677 : use vInt not int to encode arc's target address in un-packed FSTs
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Robert Muir
          http://svn.apache.org/viewvc?view=revision&revision=1435141

          LUCENE-4677, LUCENE-4682, LUCENE-4678, LUCENE-3298: Merged /lucene/dev/trunk:r1432459,1432466,1432472,1432474,1432522,1432646,1433026,1433109

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1435141 LUCENE-4677 , LUCENE-4682 , LUCENE-4678 , LUCENE-3298 : Merged /lucene/dev/trunk:r1432459,1432466,1432472,1432474,1432522,1432646,1433026,1433109
          Hide
          Steve Rowe added a comment -

          Looks like this can be resolved?

          Show
          Steve Rowe added a comment - Looks like this can be resolved?
          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.

            People

            • Assignee:
              Michael McCandless
              Reporter:
              Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development