Lucene - Core
  1. Lucene - Core
  2. LUCENE-4230

When pulling DocsAndPositionsEnum you should state whether you need payloads

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-BETA, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Payloads are extra-costly today because when pulling a
      DocsAndPositionsEnum, the codec has no way to know if you need access
      to the payloads. Tracking the payloads, even if the app never
      retrieves them, is often costly...

      1. LUCENE-4230.patch
        109 kB
        Robert Muir
      2. LUCENE-4230.patch
        109 kB
        Robert Muir
      3. LUCENE-4230.patch
        103 kB
        Robert Muir
      4. LUCENE-4230.patch
        101 kB
        Robert Muir
      5. LUCENE-4230.patch
        99 kB
        Robert Muir
      6. LUCENE-4230.patch
        102 kB
        Michael McCandless
      7. LUCENE-4230.patch
        93 kB
        Robert Muir
      8. LUCENE-4230.patch
        90 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          Michael McCandless added a comment -

          Initial patch, just with a rote cutover of the current boolean
          needsOffsets to bit wise flags instead; next I'll at a flag for
          payloads and (carefully) fix all places that pull D&PEnum to state
          whether they need payloads...

          Show
          Michael McCandless added a comment - Initial patch, just with a rote cutover of the current boolean needsOffsets to bit wise flags instead; next I'll at a flag for payloads and (carefully) fix all places that pull D&PEnum to state whether they need payloads...
          Hide
          Robert Muir added a comment -

          I don't like POS_NONE.

          Can we just have a final method: d&pEnum(LiveDocs, Reuse) that calls d&pEnum(LiveDocs, Reuse, 0) ?

          Show
          Robert Muir added a comment - I don't like POS_NONE. Can we just have a final method: d&pEnum(LiveDocs, Reuse) that calls d&pEnum(LiveDocs, Reuse, 0) ?
          Hide
          Robert Muir added a comment -

          Also I think the flags belong on DocsAndPositionsEnum, not TermsEnum.

          Show
          Robert Muir added a comment - Also I think the flags belong on DocsAndPositionsEnum, not TermsEnum.
          Hide
          Michael McCandless added a comment -

          Can we just have a final method: d&pEnum(LiveDocs, Reuse) that calls d&pEnum(LiveDocs, Reuse, 0) ?

          +1

          Also I think the flags belong on DocsAndPositionsEnum, not TermsEnum.

          +1

          Show
          Michael McCandless added a comment - Can we just have a final method: d&pEnum(LiveDocs, Reuse) that calls d&pEnum(LiveDocs, Reuse, 0) ? +1 Also I think the flags belong on DocsAndPositionsEnum, not TermsEnum. +1
          Hide
          Robert Muir added a comment -

          I'll take a stab at updating the patch. This is a good change.

          Show
          Robert Muir added a comment - I'll take a stab at updating the patch. This is a good change.
          Hide
          Robert Muir added a comment -

          updated patch (needs testing)

          Show
          Robert Muir added a comment - updated patch (needs testing)
          Hide
          Michael McCandless added a comment -

          +1 looks great! Maybe improve the javadocs for the flags to link to TermsEnum#docsAndPositionsEnum? (Hmm and IR's sugar too).

          Show
          Michael McCandless added a comment - +1 looks great! Maybe improve the javadocs for the flags to link to TermsEnum#docsAndPositionsEnum? (Hmm and IR's sugar too).
          Hide
          Michael McCandless added a comment -

          New patch, making the default docsAndPositions return payloads and
          offsets, and then caller can "optimize" if they don't need either or
          both (I optimized MultiPhraseQuery/PhraseQuery/SloppyPhraseQuery).

          Also, we no longer return null if the caller wants offsets but they
          were not indexed (caller can consult FieldInfo if they need to know),
          and instead return -1 (offsets) and null (payloads) if they weren't
          indexed.

          Separately we can optimize other places that don't need payloads.

          Show
          Michael McCandless added a comment - New patch, making the default docsAndPositions return payloads and offsets, and then caller can "optimize" if they don't need either or both (I optimized MultiPhraseQuery/PhraseQuery/SloppyPhraseQuery). Also, we no longer return null if the caller wants offsets but they were not indexed (caller can consult FieldInfo if they need to know), and instead return -1 (offsets) and null (payloads) if they weren't indexed. Separately we can optimize other places that don't need payloads.
          Hide
          Robert Muir added a comment -

          updated patch: I simplified tests to just call dpEnum(Bits, reuse) instead of passing flags.

          Basically I think tests should just work on 'whats there', and we should fix newField such that we sometimes 'add extra stuff in addition to what the test needs'.

          This not only simplifies our code but will maximize test coverage. I'm resisting the urge to beef up TestDuelingCodecs... another issue for that

          Separately I added some nocommits while reviewing the code: places where we are still conditionalizing the code based on a null result (which wont happen).

          Show
          Robert Muir added a comment - updated patch: I simplified tests to just call dpEnum(Bits, reuse) instead of passing flags. Basically I think tests should just work on 'whats there', and we should fix newField such that we sometimes 'add extra stuff in addition to what the test needs'. This not only simplifies our code but will maximize test coverage. I'm resisting the urge to beef up TestDuelingCodecs... another issue for that Separately I added some nocommits while reviewing the code: places where we are still conditionalizing the code based on a null result (which wont happen).
          Hide
          Robert Muir added a comment -

          cleared some nocommits, 4 left.

          Show
          Robert Muir added a comment - cleared some nocommits, 4 left.
          Hide
          Robert Muir added a comment -

          one nocommit left: checkindex. I'll review it and see if i can clear it up

          Show
          Robert Muir added a comment - one nocommit left: checkindex. I'll review it and see if i can clear it up
          Hide
          Robert Muir added a comment -

          all nocommits gone

          Show
          Robert Muir added a comment - all nocommits gone
          Hide
          Robert Muir added a comment -

          beasting found a bug in the porting of TermVectorComponent... I fixed this.

          Show
          Robert Muir added a comment - beasting found a bug in the porting of TermVectorComponent... I fixed this.
          Hide
          Michael McCandless added a comment -

          +1 looks great! Thanks Robert.

          Show
          Michael McCandless added a comment - +1 looks great! Thanks Robert.

            People

            • Assignee:
              Michael McCandless
              Reporter:
              Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development