Lucene - Core
  1. Lucene - Core
  2. LUCENE-2625

IndexReader.termDocs() retrieves no documents

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.1
    • Fix Version/s: 3.1
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      TermDocs object returned by indexReader.termDocs() retrieves no documents, howerver, the documents are retrieved correctly when using indexReader.termDocs(Term), indexReader.termDocs(null) and indexSearcher.search(Query).

      1. TestTermDocs.java
        2 kB
        Adriano Crestani
      2. LUCENE-2625.patch
        0.7 kB
        Robert Muir

        Activity

        Adriano Crestani created issue -
        Hide
        Adriano Crestani added a comment -

        This test case reproduces the problem. It fails using branch_3x rev 989949, but passes fine using lucene 3.0.2

        Show
        Adriano Crestani added a comment - This test case reproduces the problem. It fails using branch_3x rev 989949, but passes fine using lucene 3.0.2
        Adriano Crestani made changes -
        Field Original Value New Value
        Attachment TestTermDocs.java [ 12453187 ]
        Hide
        Shai Erera added a comment -

        You add the field as NOT_ANALYZED, therefore there is no indexed term "field:value". Is the first assert succeeds? I'd assume it should return false.

        Show
        Shai Erera added a comment - You add the field as NOT_ANALYZED, therefore there is no indexed term "field:value". Is the first assert succeeds? I'd assume it should return false.
        Hide
        Robert Muir added a comment -

        Hi, I'm not sure about this one being a bug:

        1. the test calls termDocs(), which is unpositioned
        2. the test then calls next(), but it never positions it with seek()

        so in my opinion calling next() on an uninitialized iterator can return wrong results, throw an exception, or return nothing at all.

        Show
        Robert Muir added a comment - Hi, I'm not sure about this one being a bug: the test calls termDocs(), which is unpositioned the test then calls next(), but it never positions it with seek() so in my opinion calling next() on an uninitialized iterator can return wrong results, throw an exception, or return nothing at all.
        Hide
        Adriano Crestani added a comment -

        Hi Shai,

        Sorry, but what do you mean by no field:value indexed? NOT_ANALYZED means the term will be indexed as is.

        And yes, all assertions pass fine but the last one, which only fails using lucene 3x rev 989949

        Show
        Adriano Crestani added a comment - Hi Shai, Sorry, but what do you mean by no field:value indexed? NOT_ANALYZED means the term will be indexed as is. And yes, all assertions pass fine but the last one, which only fails using lucene 3x rev 989949
        Hide
        Shai Erera added a comment -

        Sorry, I've misread NOT_ANALYZED w/ NO. Too early for me .

        I agree w/ Robert though. Calling termDocs() w/o first positioning it (by seeking) is meaningless. If it returned false/true in 2.9, I don't think it matters?

        Show
        Shai Erera added a comment - Sorry, I've misread NOT_ANALYZED w/ NO. Too early for me . I agree w/ Robert though. Calling termDocs() w/o first positioning it (by seeking) is meaningless. If it returned false/true in 2.9, I don't think it matters?
        Hide
        Adriano Crestani added a comment -

        Thanks for the quick reply Shai and Robert.

        It makes sense now, but it should be documented somewhere, mainly because the behavior has change from 3.0 to 3.1. Everybody, since version 2, that used to iterate over all documents that way, still expects it to work without calling seek, at least I did.

        In my opinion, being unpositioned means the user needs to be aware about docs being retrieved in any order, all that matters after all is to iterate over all documents in the index.

        Another question is, if indexReader.termDocs(null) is used, where is it positioned if nothing was defined in the parameter? Shouldn't it work as termDocs()? It just feels inconsistent to me.

        Show
        Adriano Crestani added a comment - Thanks for the quick reply Shai and Robert. It makes sense now, but it should be documented somewhere, mainly because the behavior has change from 3.0 to 3.1. Everybody, since version 2, that used to iterate over all documents that way, still expects it to work without calling seek, at least I did. In my opinion, being unpositioned means the user needs to be aware about docs being retrieved in any order, all that matters after all is to iterate over all documents in the index. Another question is, if indexReader.termDocs(null) is used, where is it positioned if nothing was defined in the parameter? Shouldn't it work as termDocs()? It just feels inconsistent to me.
        Hide
        Adriano Crestani added a comment -

        Just confirmed here, invoking seek does fix the problem

        Show
        Adriano Crestani added a comment - Just confirmed here, invoking seek does fix the problem
        Hide
        Robert Muir added a comment -

        In my opinion, being unpositioned means the user needs to be aware about docs being retrieved in any order, all that matters after all is to iterate over all documents in the index.

        termDocs() is unpositioned, implying you will seek() it yourself with Term/TermEnum

        Another question is, if indexReader.termDocs(null) is used, where is it positioned if nothing was defined in the parameter? Shouldn't it work as termDocs()? It just feels inconsistent to me.

        termDocs(term) is like termDocs() + seek(term), except for the special null case as listed in the docs (If term is null, then all non-deleted docs are returned with freq=1)

        i'm inclined to agree termDocs(null) is inconsistent because it doesnt work like termDocs() + seek(null), but instead returns the wacky AllTermsDocs

        Show
        Robert Muir added a comment - In my opinion, being unpositioned means the user needs to be aware about docs being retrieved in any order, all that matters after all is to iterate over all documents in the index. termDocs() is unpositioned, implying you will seek() it yourself with Term/TermEnum Another question is, if indexReader.termDocs(null) is used, where is it positioned if nothing was defined in the parameter? Shouldn't it work as termDocs()? It just feels inconsistent to me. termDocs(term) is like termDocs() + seek(term), except for the special null case as listed in the docs (If term is null, then all non-deleted docs are returned with freq=1) i'm inclined to agree termDocs(null) is inconsistent because it doesnt work like termDocs() + seek(null), but instead returns the wacky AllTermsDocs
        Hide
        Robert Muir added a comment -

        javadocs patch reminding you to first seek the unpositioned termdocs

        Show
        Robert Muir added a comment - javadocs patch reminding you to first seek the unpositioned termdocs
        Robert Muir made changes -
        Attachment LUCENE-2625.patch [ 12457510 ]
        Hide
        Michael McCandless added a comment -

        javadoc patch looks good Robert!

        Show
        Michael McCandless added a comment - javadoc patch looks good Robert!
        Robert Muir made changes -
        Assignee Robert Muir [ rcmuir ]
        Hide
        Robert Muir added a comment -

        Thanks for bringing this up Adriano!

        Show
        Robert Muir added a comment - Thanks for bringing this up Adriano!
        Robert Muir made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Mark Thomas made changes -
        Workflow jira [ 12519062 ] Default workflow, editable Closed status [ 12563697 ]
        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12563697 ] jira [ 12585257 ]
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1
        Grant Ingersoll made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        53d 12h 40m 1 Robert Muir 19/Oct/10 13:14
        Resolved Resolved Closed Closed
        162d 3h 35m 1 Grant Ingersoll 30/Mar/11 16:50

          People

          • Assignee:
            Robert Muir
            Reporter:
            Adriano Crestani
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development