Lucene - Core
  1. Lucene - Core
  2. LUCENE-5028

doShare is pointless in PositiveIntOutputs

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.4, 6.0
    • Component/s: core/FSTs
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We recently use this in oal.core.fst.PositiveIntOutputs to indicate whether to share outputs. The comment mentioned 'with doShare=false, in some case this may result in a smaller FST'. However, this is not intuitive, as for long type, we always have the smallest output reduced to NO_OUTPUT, thus the smallest one is 'moved' towards root, and no extra output is created.

      However, if there are many many small outputs around root arcs, when we share outputs, a large output might be pushed into the root arcs. When root arcs are packed as fixed-array, yes the size of FST is increased. But, I suppose this should invoke other intuitive heuristics, instead of the confusing 'doShare'?

      Besides, this only exist in PositiveIntOutputs.

      1. LUCENE-5028.patch
        2 kB
        Han Jiang
      2. LUCENE-5028.patch
        23 kB
        Han Jiang
      3. LUCENE-5028.patch
        23 kB
        Han Jiang

        Activity

        Hide
        Robert Muir added a comment -

        I think instead of having this boolean in PositiveIntOutputs, if someone wants a non-sharing outputs impl, it should just be a different outputs implementation?

        if anything is using this / anyone is upset by this, we could add one to the sandbox.

        Show
        Robert Muir added a comment - I think instead of having this boolean in PositiveIntOutputs, if someone wants a non-sharing outputs impl, it should just be a different outputs implementation? if anything is using this / anyone is upset by this, we could add one to the sandbox.
        Hide
        Michael McCandless added a comment -

        +1 to nuke it!

        I think, except for array-arc effects, the FST is never smaller with doShare=false (I had thought it was but now I disagree with my past self!).

        Han do you wanna make a patch? Thanks.

        Show
        Michael McCandless added a comment - +1 to nuke it! I think, except for array-arc effects, the FST is never smaller with doShare=false (I had thought it was but now I disagree with my past self!). Han do you wanna make a patch? Thanks.
        Hide
        Han Jiang added a comment -

        yes, I think this should be kept intuitive for general usage.

        I take a glimpse of the grep result, strange that we use doShare=false in codecs.simpletext.SimpleTextFieldsReader

        Show
        Han Jiang added a comment - yes, I think this should be kept intuitive for general usage. I take a glimpse of the grep result, strange that we use doShare=false in codecs.simpletext.SimpleTextFieldsReader
        Hide
        Han Jiang added a comment -

        ok, patch, tests pass. also updated related javadocs. sorry my laptop is slow on tests

        Show
        Han Jiang added a comment - ok, patch, tests pass. also updated related javadocs. sorry my laptop is slow on tests
        Hide
        Michael McCandless added a comment -

        Patch looks great! Maybe also remove doShare from UpToTwoPositiveIntOutputs?

        Show
        Michael McCandless added a comment - Patch looks great! Maybe also remove doShare from UpToTwoPositiveIntOutputs?
        Hide
        Han Jiang added a comment -

        hmm, maybe not? I suppose, for TwoLong case, when the first long is 'moved' towards root and minus to zero, the second one might still be greater than zero? So, when output is shared, we might create an extra output.

        Show
        Han Jiang added a comment - hmm, maybe not? I suppose, for TwoLong case, when the first long is 'moved' towards root and minus to zero, the second one might still be greater than zero? So, when output is shared, we might create an extra output.
        Hide
        Han Jiang added a comment -

        the second patch also removes a silly equal check in PositiveIntOutputs

        Show
        Han Jiang added a comment - the second patch also removes a silly equal check in PositiveIntOutputs
        Hide
        Han Jiang added a comment - - edited

        The third one nukes 'doShare' in UpToTwoPositiveIntOutputs as well, since those codes are in 'misc', and doShare=false sometimes makes sense, I'm not sure whether this is the right way

        If we're going to totally omit 'doShare', maybe we need some javadocs in fst.Builder, so that this 'sharing increases fst size' is not forgotten?

        Show
        Han Jiang added a comment - - edited The third one nukes 'doShare' in UpToTwoPositiveIntOutputs as well, since those codes are in 'misc', and doShare=false sometimes makes sense, I'm not sure whether this is the right way If we're going to totally omit 'doShare', maybe we need some javadocs in fst.Builder, so that this 'sharing increases fst size' is not forgotten?
        Hide
        Michael McCandless added a comment -

        Hmm, I agree it's not clear that doShare is useless on UpToTwoPositiveIntOutputs. Let's just leave that one for now then? I'll look at the 2nd patch soon ... but I'll be offline for next ~36 hours or so ...

        Show
        Michael McCandless added a comment - Hmm, I agree it's not clear that doShare is useless on UpToTwoPositiveIntOutputs. Let's just leave that one for now then? I'll look at the 2nd patch soon ... but I'll be offline for next ~36 hours or so ...
        Hide
        Michael McCandless added a comment -

        The 2nd patch looks great, I'll commit shortly! Thanks Han.

        I'll also go improve the javadocs in UpToTwo/List outputs impls...

        Show
        Michael McCandless added a comment - The 2nd patch looks great, I'll commit shortly! Thanks Han. I'll also go improve the javadocs in UpToTwo/List outputs impls...
        Hide
        Michael McCandless added a comment -

        Thanks Han!

        Show
        Michael McCandless added a comment - Thanks Han!
        Hide
        Han Jiang added a comment -

        Mike, thanks for the commit!

        Show
        Han Jiang added a comment - Mike, thanks for the commit!
        Hide
        Steve Rowe added a comment -

        Bulk close resolved 4.4 issues

        Show
        Steve Rowe added a comment - Bulk close resolved 4.4 issues

          People

          • Assignee:
            Han Jiang
            Reporter:
            Han Jiang
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development