Lucene - Core
  1. Lucene - Core
  2. LUCENE-561

ParallelReader fails on deletes and on seeks of previously unused fields

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 2.0.0
    • Component/s: core/index
    • Labels:
      None
    • Environment:

      All

      Description

      In using ParallelReader I've hit two bugs:

      1. ParallelReader.doDelete() and doUndeleteAll() call doDelete() and doUndeleteAll() on the subreaders, but these methods do not set hasChanges. Thus the changes are lost when the readers are closed. The fix is to call deleteDocument() and undeleteAll() on the subreaders instead.

      2. ParallelReader discovers the fields in each subindex by using IndexReader.getFieldNames() which only finds fields that have occurred on at least one document. In general a parallel index is designed with assignments of fields to sub-indexes and term seeks (including searches) may be done on any of those fields, even if no documents in a particular state of the index have yet had an assigned field. Seeks/searches on fields that have not yet been indexed generated an NPE in ParallelReader's various inner class seek() and next() methods because fieldToReader.get() returns null on the unseen field. The fix is to extend the add() methods to supply the correct list of fields for each subindex.

      Patch that corrects both of these issues attached.

      1. ParallelReaderBugs.patch
        3 kB
        Chuck Williams
      2. ParallelReaderBugs.patch
        6 kB
        Chuck Williams

        Activity

        Hide
        Chuck Williams added a comment -

        This new version of the patch provides a more general solution for the unknown field NPE problem (the solution for the delete problem is the same and is included in the new patch file).

        For the NPE problem, the new patch handles the case of an unknown field explicitly in all ParallelReader API's for which it is relevant. This is quite a few of them, but I think I got them all.

        The first solution of specifying the field list for each reader was expedient in my applicaiton since all fields are pre-configured and validated on index operations. However, for Lucene in general such field processing is not required. E.g., with ParallelReader as it is in Lucene now and with the first version of the patch, a user-entered misspelled field name passed in a Query to search() generates an NPE instead of zero results. With the new version of the patch, zero resutls are returned properly.

        Show
        Chuck Williams added a comment - This new version of the patch provides a more general solution for the unknown field NPE problem (the solution for the delete problem is the same and is included in the new patch file). For the NPE problem, the new patch handles the case of an unknown field explicitly in all ParallelReader API's for which it is relevant. This is quite a few of them, but I think I got them all. The first solution of specifying the field list for each reader was expedient in my applicaiton since all fields are pre-configured and validated on index operations. However, for Lucene in general such field processing is not required. E.g., with ParallelReader as it is in Lucene now and with the first version of the patch, a user-entered misspelled field name passed in a Query to search() generates an NPE instead of zero results. With the new version of the patch, zero resutls are returned properly.
        Hide
        Yonik Seeley added a comment -

        Chuck, I took a very quick scan, most of it looks fine.
        What about seek() calls though... if the reader is null, shouldn't termDocs be set to null?

        public void seek(Term term) throws IOException {

        • termDocs = ((IndexReader)fieldToReader.get(term.field())).termDocs(term);
          + IndexReader reader = ((IndexReader)fieldToReader.get(term.field()));
          + termDocs = reader!=null ? reader.termDocs(term) : null;
        Show
        Yonik Seeley added a comment - Chuck, I took a very quick scan, most of it looks fine. What about seek() calls though... if the reader is null, shouldn't termDocs be set to null? public void seek(Term term) throws IOException { termDocs = ((IndexReader)fieldToReader.get(term.field())).termDocs(term); + IndexReader reader = ((IndexReader)fieldToReader.get(term.field())); + termDocs = reader!=null ? reader.termDocs(term) : null;
        Hide
        Chuck Williams added a comment -

        Yonik, thanks for looking at this, and good catch! If there are multiple seeks on the termDocs, one is performed on a known field and not iterated to exahaustion, and then a seek is performed on an unknown field, then without your fix I believe the unknown field would erroneously report remaining documents on the prior known field.

        It appears nothing I've done hit this pariticular case! Sorry I missed it...

        Show
        Chuck Williams added a comment - Yonik, thanks for looking at this, and good catch! If there are multiple seeks on the termDocs, one is performed on a known field and not iterated to exahaustion, and then a seek is performed on an unknown field, then without your fix I believe the unknown field would erroneously report remaining documents on the prior known field. It appears nothing I've done hit this pariticular case! Sorry I missed it...
        Hide
        Yonik Seeley added a comment -

        This patch also fixes another little bug... the previous behavior bypassed synchronization of deleteDocument in subreaders.

        Show
        Yonik Seeley added a comment - This patch also fixes another little bug... the previous behavior bypassed synchronization of deleteDocument in subreaders.
        Hide
        Yonik Seeley added a comment -

        Patch applied, thanks Chuck!

        Show
        Yonik Seeley added a comment - Patch applied, thanks Chuck!
        Hide
        Christian Kohlschuetter added a comment -

        Chuck: Unfortunately, the NPE in seek/next can still be triggered.
        See LUCENE-398 for Testcase and a suggested patch.

        Show
        Christian Kohlschuetter added a comment - Chuck: Unfortunately, the NPE in seek/next can still be triggered. See LUCENE-398 for Testcase and a suggested patch.
        Hide
        Chuck Williams added a comment -

        Christian,

        That is a different bug than this one. This bug has been fixed.

        Chuck

        Show
        Chuck Williams added a comment - Christian, That is a different bug than this one. This bug has been fixed. Chuck

          People

          • Assignee:
            Yonik Seeley
            Reporter:
            Chuck Williams
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development