Lucene - Core
  1. Lucene - Core
  2. LUCENE-2736

Wrong implementation of DocIdSetIterator.advance

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.2, 4.0-ALPHA
    • Fix Version/s: 3.2, 4.0-ALPHA
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Implementations of DocIdSetIterator behave differently when advanced is called. Taking the following test for OpenBitSet, DocIdBitSet and SortedVIntList only SortedVIntList passes the test:

      org.apache.lucene.search.TestDocIdSet.java
      ...
      	public void testAdvanceWithOpenBitSet() throws IOException {
      		DocIdSet idSet = new OpenBitSet( new long[] { 1121 }, 1 );  // bits 0, 5, 6, 10
      		assertAdvance( idSet );
      	}
      
      	public void testAdvanceDocIdBitSet() throws IOException {
      		BitSet bitSet = new BitSet();
      		bitSet.set( 0 );
      		bitSet.set( 5 );
      		bitSet.set( 6 );
      		bitSet.set( 10 );
      		DocIdSet idSet = new DocIdBitSet(bitSet);
      		assertAdvance( idSet );
      	}
      
      	public void testAdvanceWithSortedVIntList() throws IOException {
      		DocIdSet idSet = new SortedVIntList( 0, 5, 6, 10 );
      		assertAdvance( idSet );
      	}	
      
      	private void assertAdvance(DocIdSet idSet) throws IOException {
      		DocIdSetIterator iter = idSet.iterator();
      		int docId = iter.nextDoc();
      		assertEquals( "First doc id should be 0", 0, docId );
      
      		docId = iter.nextDoc();
      		assertEquals( "Second doc id should be 5", 5, docId );
      
      		docId = iter.advance( 5 );
      		assertEquals( "Advancing iterator should return the next doc id", 6, docId );
      	}
      

      The javadoc for advance says:

      Advances to the first beyond the current whose document number is greater than or equal to target.

      This seems to indicate that SortedVIntList behaves correctly, whereas the other two don't.
      Just looking at the DocIdBitSet implementation advance is implemented as:

      bitSet.nextSetBit(target);
      

      where the docs of nextSetBit say:

      Returns the index of the first bit that is set to true that occurs on or after the specified starting index

        Activity

        Robert Muir made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Robert Muir added a comment -

        Bulk closing for 3.2

        Show
        Robert Muir added a comment - Bulk closing for 3.2
        Shai Erera made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Lucene Fields [New] [New, Patch Available]
        Fix Version/s 3.2 [ 12316070 ]
        Fix Version/s 4.0 [ 12314025 ]
        Resolution Fixed [ 1 ]
        Hide
        Shai Erera added a comment -

        Thanks Doron - I changed the javadocs as you suggest.

        Committed revision 1104159 (3x).
        Committed revision 1104167 (trunk).

        Thanks Hardy for reporting that !

        Show
        Shai Erera added a comment - Thanks Doron - I changed the javadocs as you suggest. Committed revision 1104159 (3x). Committed revision 1104167 (trunk). Thanks Hardy for reporting that !
        Hide
        Doron Cohen added a comment -

        Shai, with the modified text the NOTE on "implementations freedom to not advance beyond in some situations" becomes strange... I think that the original text stress the fact the "real intended" behavior is to do advance beyond current, just that for performance reasons the decision whether to advance beyond in some situations is left for implementation decision, and so, if caller provides a target which is not greater than current, it should be aware of this possibility.

        So I think it is perhaps better to either not modify this at all, or at most, to add "(see NOTE below)" just after "beyond":

        -   * Advances to the first beyond the current whose document number is greater
        +   * Advances to the first beyond (see NOTE below) the current whose document number is greater
        

        This would prevent the confusion I think?

        Show
        Doron Cohen added a comment - Shai, with the modified text the NOTE on "implementations freedom to not advance beyond in some situations" becomes strange... I think that the original text stress the fact the "real intended" behavior is to do advance beyond current, just that for performance reasons the decision whether to advance beyond in some situations is left for implementation decision, and so, if caller provides a target which is not greater than current, it should be aware of this possibility. So I think it is perhaps better to either not modify this at all, or at most, to add "(see NOTE below)" just after "beyond": - * Advances to the first beyond the current whose document number is greater + * Advances to the first beyond (see NOTE below) the current whose document number is greater This would prevent the confusion I think?
        Shai Erera made changes -
        Attachment LUCENE-2736.patch [ 12479431 ]
        Hide
        Shai Erera added a comment -

        Patch with Javadocs fixes. I will commit it later today.

        Show
        Shai Erera added a comment - Patch with Javadocs fixes. I will commit it later today.
        Shai Erera made changes -
        Assignee Shai Erera [ shaie ]
        Affects Version/s 3.2 [ 12316070 ]
        Affects Version/s 3.0.2 [ 12314798 ]
        Component/s core/search [ 12310235 ]
        Component/s core/other [ 12310233 ]
        Hide
        Shai Erera added a comment -

        Thanks Hardy for reporting that.

        But I think this works exactly as documented? Note that the javadocs of advance() state "beyond the current whose document number is greater than or equal to target". Also, there's a note in the javadocs:

           * <b>NOTE:</b> when <code> target &le; current</code> implementations may opt 
           * not to advance beyond their current {@link #docID()}.
        

        I think that the word 'beyond' is confusing here. Perhaps we can modify the javadocs to:

        "Advances to the first document whose number is greater than or equal to target"

        If there are no objections, or better wording, I'll commit this later today, but only to 3.2/4.0 and not 3.0.2

        Show
        Shai Erera added a comment - Thanks Hardy for reporting that. But I think this works exactly as documented? Note that the javadocs of advance() state " beyond the current whose document number is greater than or equal to target". Also, there's a note in the javadocs: * <b>NOTE:</b> when <code> target &le; current</code> implementations may opt * not to advance beyond their current {@link #docID()}. I think that the word 'beyond' is confusing here. Perhaps we can modify the javadocs to: "Advances to the first document whose number is greater than or equal to target" If there are no objections, or better wording, I'll commit this later today, but only to 3.2/4.0 and not 3.0.2
        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12563772 ] jira [ 12584451 ]
        Mark Thomas made changes -
        Field Original Value New Value
        Workflow jira [ 12525858 ] Default workflow, editable Closed status [ 12563772 ]
        Hardy Ferentschik created issue -

          People

          • Assignee:
            Shai Erera
            Reporter:
            Hardy Ferentschik
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development