Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1096

Deleting docs of all returned Hits during search causes ArrayIndexOutOfBoundsException

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.1
    • Fix Version/s: 2.3
    • Component/s: core/index
    • Labels:
      None
    • Environment:

      WindowsXPSP1/Lenovo-T60p

    • Lucene Fields:
      New, Patch Available

      Description

      For background user discussion:
      http://www.nabble.com/document-deletion-problem-to14414351.html

      Hits h = m_indexSearcher.search(q); // Returns 11475 documents 
      for(int i = 0; i < h.length(); i++) 
      { 
        int doc = h.id(i); 
        m_indexSearcher.getIndexReader().deleteDocument(doc);  <-- causes ArrayIndexOutOfBoundsException when i = 6400
      } 
      
      1. TestSearchDelete.java
        3 kB
        Doron Cohen
      2. lucene-1096.patch
        9 kB
        Doron Cohen
      3. lucene-1096.patch
        11 kB
        Doron Cohen

        Activity

        Hide
        doronc Doron Cohen added a comment -

        Committed.

        Show
        doronc Doron Cohen added a comment - Committed.
        Hide
        doronc Doron Cohen added a comment -

        An updated patch:

        • more intensive tests.
        • Hits now compares status between iterations so that it checks
          for hits deletions only if anything changed from previous invocation
          of getMoreDocs().
        • Javadocs updated explaining when ConcurrentModificationException is thrown.
        Show
        doronc Doron Cohen added a comment - An updated patch: more intensive tests. Hits now compares status between iterations so that it checks for hits deletions only if anything changed from previous invocation of getMoreDocs(). Javadocs updated explaining when ConcurrentModificationException is thrown.
        Hide
        doronc Doron Cohen added a comment -

        Patch with tests for the two scenarios described above, and a fix for Hits.

        The fix works as follows:

        • Records the # deleted docs when the search starts.
        • Records the total number of hits as found in the
          first call to getMoreDocs().
        • After every additional call to getMoreDocs(), the # deleted
          docs is checked again.
          If it was modified, the new hits are checked to see if some
          of the old hits are missing, due to deletions in between.
          The start point for copying new hits into the array of hits
          is updated accordingly. Also updated is the totalHits count.
        • Trying to fetch a hit that is not higher than the original # hits
          but is larger than the current number of hits, triggers
          a ConcurrentModificationException, because this hit was
          deleted before we were able to retrieve it.

        Note that when searching not with an IndexSearcher,
        there is no access for counting deletions and so the two
        arrays check is always performed - as part of getMoreDocs().

        All tests pass.

        Show
        doronc Doron Cohen added a comment - Patch with tests for the two scenarios described above, and a fix for Hits. The fix works as follows: Records the # deleted docs when the search starts. Records the total number of hits as found in the first call to getMoreDocs(). After every additional call to getMoreDocs(), the # deleted docs is checked again. If it was modified, the new hits are checked to see if some of the old hits are missing, due to deletions in between. The start point for copying new hits into the array of hits is updated accordingly. Also updated is the totalHits count. Trying to fetch a hit that is not higher than the original # hits but is larger than the current number of hits, triggers a ConcurrentModificationException, because this hit was deleted before we were able to retrieve it. Note that when searching not with an IndexSearcher, there is no access for counting deletions and so the two arrays check is always performed - as part of getMoreDocs(). All tests pass.
        Hide
        doronc Doron Cohen added a comment -

        It seems that this is a serious problem with Hits based search. An application deleting returned hits as in this case will actually get "holes" in the returned docs, every time getMoreDocs() is internally invoked.

        For instance, say all docs match, and the app deletes doc 50. Hits 0 to 99 are ok, they will contain docs 0 to 99. But because of the deletion, hit 100 will be doc 101, i.e. doc 100 was skipped.

        Fixing this behavior is possible in the Hits class if only hits are deleted. I.e. if, while getting hit n, hits m where m<=n are deleted. But if a higher doc is deleted, a doc that would be seen later, after more internal calls to getMoreDocs(), things get more messy, and an ArrayIndexOutOfBound exception can be caused, as in this bug.

        Note that the latter case can also happen if another thread deletes such a "front" hit doc.

        I think the proper exception for the latter case is ConcurrentModificationException, and Hits should be changed to distinguish between the two.

        Show
        doronc Doron Cohen added a comment - It seems that this is a serious problem with Hits based search. An application deleting returned hits as in this case will actually get "holes" in the returned docs, every time getMoreDocs() is internally invoked. For instance, say all docs match, and the app deletes doc 50. Hits 0 to 99 are ok, they will contain docs 0 to 99. But because of the deletion, hit 100 will be doc 101, i.e. doc 100 was skipped. Fixing this behavior is possible in the Hits class if only hits are deleted. I.e. if, while getting hit n, hits m where m<=n are deleted. But if a higher doc is deleted, a doc that would be seen later, after more internal calls to getMoreDocs(), things get more messy, and an ArrayIndexOutOfBound exception can be caused, as in this bug. Note that the latter case can also happen if another thread deletes such a "front" hit doc. I think the proper exception for the latter case is ConcurrentModificationException, and Hits should be changed to distinguish between the two.
        Hide
        doronc Doron Cohen added a comment -

        Test failing with this bug.

        Show
        doronc Doron Cohen added a comment - Test failing with this bug.
        Hide
        doronc Doron Cohen added a comment -

        refine summary and beautify description.

        Show
        doronc Doron Cohen added a comment - refine summary and beautify description.

          People

          • Assignee:
            doronc Doron Cohen
            Reporter:
            snowhow Tushar B
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development