Lucene - Core
  1. Lucene - Core
  2. LUCENE-5670

org.apache.lucene.util.fst.FST should skip over outputs it is not interested in

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.7
    • Fix Version/s: 4.9, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Currently the FST uses the read(DataInput) method from the Outputs class to skip over outputs it actually is not interested in. For most use cases this just creates some additional objects that are immediately destroyed again.

      When traversing an FST with non-trivial data however this can easily add up to several excess objects that nobody actually ever read.

      1. LUCENE-5670.patch
        7 kB
        Christian Ziech
      2. skipOutput_lucene48.patch
        8 kB
        Christian Ziech

        Activity

        Hide
        Christian Ziech added a comment -

        Attached an (untested) patch where a "skipOutput" method is added to the outputs which doesn't create excess objects. Default implementation is the current behavior by invoking the read() method.

        Also a skipBytes(int) method was added to the DataInput which defaults to reading the data as before. Several implementations of the DataInput already had a skipBytes() method and now effectively implement it.

        Show
        Christian Ziech added a comment - Attached an (untested) patch where a "skipOutput" method is added to the outputs which doesn't create excess objects. Default implementation is the current behavior by invoking the read() method. Also a skipBytes(int) method was added to the DataInput which defaults to reading the data as before. Several implementations of the DataInput already had a skipBytes() method and now effectively implement it.
        Hide
        Michael McCandless added a comment -

        This looks like a good idea; why did you need to add DataInput.skipBytes? Don't we already have that method (taking long)?

        Show
        Michael McCandless added a comment - This looks like a good idea; why did you need to add DataInput.skipBytes? Don't we already have that method (taking long)?
        Hide
        Christian Ziech added a comment -

        No actually only some of the subclasses of DataInput had a skipBytes() implementation - e.g. the BytesReader() intermediate abstract class added it to the interface and also the ByteArrayDataInput had it before. Maybe one should scan over all the other implementations if they had a similar method that was just named differently or could implement it (e.g. IndexInput could easily implement the skip method as a comination of seek and getFilePointer).

        Show
        Christian Ziech added a comment - No actually only some of the subclasses of DataInput had a skipBytes() implementation - e.g. the BytesReader() intermediate abstract class added it to the interface and also the ByteArrayDataInput had it before. Maybe one should scan over all the other implementations if they had a similar method that was just named differently or could implement it (e.g. IndexInput could easily implement the skip method as a comination of seek and getFilePointer).
        Hide
        Michael McCandless added a comment -

        Sorry, I wasn't talking about the subclasses of DataInput, I was talking about DataInput.java itself: it already has a skipBytes(long) defined but your patch adds a skipBytes(int)?

        e.g. IndexInput could easily implement the skip method as a comination of seek and getFilePointer

        That's right! Seems silly not to. Let's add it here?

        Show
        Michael McCandless added a comment - Sorry, I wasn't talking about the subclasses of DataInput, I was talking about DataInput.java itself: it already has a skipBytes(long) defined but your patch adds a skipBytes(int)? e.g. IndexInput could easily implement the skip method as a comination of seek and getFilePointer That's right! Seems silly not to. Let's add it here?
        Hide
        Christian Ziech added a comment - - edited

        Oh right! I only checked the 4.7 branch and there the DataInput didn't have the skipBytes() method yet. But now I saw that both trunk and the 4.8 branch have the skipBytes(long) already. So yes of course in that case we can drop it from the patch. If we can get consensus that the rest of the patch is worth doing I could implement it against 4.8 and attach it here.

        Edit: The ticket that added the skipBytes to the DataInput was LUCENE-5583

        Show
        Christian Ziech added a comment - - edited Oh right! I only checked the 4.7 branch and there the DataInput didn't have the skipBytes() method yet. But now I saw that both trunk and the 4.8 branch have the skipBytes(long) already. So yes of course in that case we can drop it from the patch. If we can get consensus that the rest of the patch is worth doing I could implement it against 4.8 and attach it here. Edit: The ticket that added the skipBytes to the DataInput was LUCENE-5583
        Hide
        Michael McCandless added a comment -

        Ahh, sorry, DataInputs.skipBytes new in 4.8. Too many versions

        I think this patch is worth doing.

        Show
        Michael McCandless added a comment - Ahh, sorry, DataInputs.skipBytes new in 4.8. Too many versions I think this patch is worth doing.
        Hide
        Christian Ziech added a comment -

        Attaching a patch relative to the lucene 4.8 branch.

        Show
        Christian Ziech added a comment - Attaching a patch relative to the lucene 4.8 branch.
        Hide
        Michael McCandless added a comment -

        Thanks Christian, the patch looks good to me! I'll commit soon.

        Show
        Michael McCandless added a comment - Thanks Christian, the patch looks good to me! I'll commit soon.
        Hide
        ASF subversion and git services added a comment -

        Commit 1596368 from Michael McCandless in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1596368 ]

        LUCENE-5670: add skip/FinalOutput to FST Outputs

        Show
        ASF subversion and git services added a comment - Commit 1596368 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1596368 ] LUCENE-5670 : add skip/FinalOutput to FST Outputs
        Hide
        ASF subversion and git services added a comment -

        Commit 1596369 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1596369 ]

        LUCENE-5670: add skip/FinalOutput to FST Outputs

        Show
        ASF subversion and git services added a comment - Commit 1596369 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1596369 ] LUCENE-5670 : add skip/FinalOutput to FST Outputs
        Hide
        Michael McCandless added a comment -

        Thanks Christian!

        Show
        Michael McCandless added a comment - Thanks Christian!

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Christian Ziech
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development