Lucene - Core
  1. Lucene - Core
  2. LUCENE-4404

Add ListOfOutputs FST Outputs, replacing UpToTwoPositiveIntOutputs

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Spinoff from LUCENE-3842. This just generalizes the
      UpToTwoPositiveIntOutputs to a list of any arbitrary output, by
      wrapping any other Outputs impl. I also made separate methods to
      write/read a node-final output: since list of values can only occur on
      a final node output, this impl optimizes and avoids writing an extra
      byte per label for normal arc labels.

      This also fixes a bug in Builder that was sometimes failing to join
      multiple outputs together.

      1. LUCENE-4404.patch
        96 kB
        Michael McCandless
      2. LUCENE-4404.patch
        24 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Patch, I think it's ready.

        Show
        Michael McCandless added a comment - Patch, I think it's ready.
        Hide
        Robert Muir added a comment -

        This seems more wasteful than the previous UpToTwo in that it always writes a vInt size?
        Is there some way to avoid this?

        Show
        Robert Muir added a comment - This seems more wasteful than the previous UpToTwo in that it always writes a vInt size? Is there some way to avoid this?
        Hide
        Michael McCandless added a comment -

        That's a good point about the bit-stealing: I forgot it did that. So I agree the new ListOfOutputs will likely make a larger FST ... but it shouldn't be so bad since it's only on the node-final outputs that it adds a byte. Also it's more general (can wrap any output (not just PositiveInt), can hold an arbitrary number).

        So I'll revive UpToTwoPositiveIntOutputs, and move it to misc (nobody actually uses it today... I think BlockTree had used it at one point), and put this new ListOfOutputs into misc too.

        Show
        Michael McCandless added a comment - That's a good point about the bit-stealing: I forgot it did that. So I agree the new ListOfOutputs will likely make a larger FST ... but it shouldn't be so bad since it's only on the node-final outputs that it adds a byte. Also it's more general (can wrap any output (not just PositiveInt), can hold an arbitrary number). So I'll revive UpToTwoPositiveIntOutputs, and move it to misc (nobody actually uses it today... I think BlockTree had used it at one point), and put this new ListOfOutputs into misc too.
        Hide
        Robert Muir added a comment -

        I agree, lets put both in misc/. The new one is good because its much more flexible.

        Show
        Robert Muir added a comment - I agree, lets put both in misc/. The new one is good because its much more flexible.
        Hide
        Michael McCandless added a comment -

        New patch, I think it's ready.

        I resurrected UpToTwoPositiveInts, in lucene/misc, and moved ListOfOutputs there too. And I refactored the reusable parts of TestFSTs into test-framework, and created TestFSTsMisc to test these two output impls.

        Show
        Michael McCandless added a comment - New patch, I think it's ready. I resurrected UpToTwoPositiveInts, in lucene/misc, and moved ListOfOutputs there too. And I refactored the reusable parts of TestFSTs into test-framework, and created TestFSTsMisc to test these two output impls.
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Michael McCandless
        http://svn.apache.org/viewvc?view=revision&revision=1388942

        LUCENE-4404: add ListOfOutputs for FST to hold more than one output per input

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1388942 LUCENE-4404 : add ListOfOutputs for FST to hold more than one output per input
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development