Lucene - Core
  1. Lucene - Core
  2. LUCENE-4273

Fix DocsEnum freq flag consistent with DocsAndPositionsEnum flags

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-BETA, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Followup from LUCENE-4230

      Currently today to pull freq() from a docsEnum, you have to pass a boolean 'needsFreqs=true'. If the field omitsTF, then it returns null (and you need to call it again, with 'needsFreqs=false', in which case you are not supposed to call freq().

      We fixed this in D&PEnum in 4230: as you can tell from the fieldinfos whether they are there or not, there is no need to return null, it makes consumer code complicated.

      So this issue is just to have docs(Bits, reuse) which calls docs(Bits, reuse, FLAG_FREQS) by default. if they arent there, the docsenum returns 1 for freq().

      So calling docs(Bits, reuse, 0) is just an optimization hint to the codec that you never need them (same as the payload/offset flags for docsAndPositions)

      1. LUCENE-4273.patch
        125 kB
        Robert Muir
      2. LUCENE-4273.patch
        125 kB
        Michael McCandless
      3. LUCENE-4273.patch
        122 kB
        Robert Muir

        Activity

        Hide
        Michael McCandless added a comment -

        Patch looks great!

        I tweaked javadocs and fixed one NPE in DirectPostingsFormat ... I think it's ready.

        Show
        Michael McCandless added a comment - Patch looks great! I tweaked javadocs and fixed one NPE in DirectPostingsFormat ... I think it's ready.
        Hide
        Robert Muir added a comment -

        Thanks Mike!

        Updated patch with more fixes to checkindex. Actually there were several problems (leniency):

        • we weren't validating sum of term frequencies against sumTotalTF in the case of DOCS_AND_FREQS
        • we weren't validating sum of term frequencies against sumTotalTF in the case of term vectors (there is no way there yet to omit them).
        • we weren't necessarily reading offsets from term vectors, because we were going by fieldinfos for the postings.
        Show
        Robert Muir added a comment - Thanks Mike! Updated patch with more fixes to checkindex. Actually there were several problems (leniency): we weren't validating sum of term frequencies against sumTotalTF in the case of DOCS_AND_FREQS we weren't validating sum of term frequencies against sumTotalTF in the case of term vectors (there is no way there yet to omit them). we weren't necessarily reading offsets from term vectors, because we were going by fieldinfos for the postings.

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development