Lucene - Core
  1. Lucene - Core
  2. LUCENE-5444

offsets in MemoryIndex broken when adding field with more than once

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.6.1
    • Fix Version/s: 4.7, 6.0
    • Component/s: core/index
    • Labels:
    • Lucene Fields:
      New, Patch Available

      Description

      When fields with the same name are added more than once to MemoryIndex, the offsets of the previous additions of the field do not seem to be taken into account. As a result, MemoryIndex cannot be used for example with the vector highlighter.

      1. LUCENE-5444.patch
        12 kB
        Simon Willnauer
      2. LUCENE-5444.patch
        12 kB
        britta weber
      3. LUCENE-5444.patch
        11 kB
        britta weber
      4. LUCENE-5444.patch
        11 kB
        britta weber

        Activity

        Hide
        britta weber added a comment -

        This patch includes a small test which compares the term vectors from MemoryIndex with those from RAMDirectory. It fails for the offsets. I also added a crude fix for MemoryIndex.

        Show
        britta weber added a comment - This patch includes a small test which compares the term vectors from MemoryIndex with those from RAMDirectory. It fails for the offsets. I also added a crude fix for MemoryIndex.
        Hide
        Simon Willnauer added a comment -

        Hey Britta, thanks for opening this issue! I actually think the patch looks pretty good though. Could you move the testcase into MemoryIndexTest instead? We usually use a utility method to create a new directory & index writers in tests like this:

         Directory dir = newDirectory();
         MockAnalyzer mockAnalyzer = new MockAnalyzer(random());
         IndexWriter writer = new IndexWriter(dir, newIndexWriterConfig(random(), TEST_VERSION_CURRENT, mockAnalyzer)); 
        

        I also think from looking at it the test must fail since you are not closing the index writer, index reader and the directory. I guess you should move most of it into a single method and only keep the compareTermVectors in a sep. method. Additionally I think since your patch allows it we should also test the `offsetGap` could you add another test for this as well? If you do so feel free to add a getOffsetGap(String) method to MockAnalyzer so you can reuse your tests!

        Show
        Simon Willnauer added a comment - Hey Britta, thanks for opening this issue! I actually think the patch looks pretty good though. Could you move the testcase into MemoryIndexTest instead? We usually use a utility method to create a new directory & index writers in tests like this: Directory dir = newDirectory(); MockAnalyzer mockAnalyzer = new MockAnalyzer(random()); IndexWriter writer = new IndexWriter(dir, newIndexWriterConfig(random(), TEST_VERSION_CURRENT, mockAnalyzer)); I also think from looking at it the test must fail since you are not closing the index writer, index reader and the directory. I guess you should move most of it into a single method and only keep the compareTermVectors in a sep. method. Additionally I think since your patch allows it we should also test the `offsetGap` could you add another test for this as well? If you do so feel free to add a getOffsetGap(String) method to MockAnalyzer so you can reuse your tests!
        Hide
        britta weber added a comment -

        Thanks a lot for the quick response!
        Here is a new patch with the requested changes.

        I added a method setOffsetGap(..) and getOffsetGap() to MockAnalyzer and now set the offsetGap random in the test, which sort of tests the gapOffset as well. I did not add a separate test for the offsetGap though. Can you hint at what exactly I should be testing?

        Also, I found that the String parameter for MockAnalyzer.setPositionIncrementGap is not really used. I added it to setOffsetGap for consistency but it makes no sense. Should it be removed?

        I close everything now that has a close method. I am now a little surprised that the missing closes caused no test failure in the original patch. Could this be a problem?

        Show
        britta weber added a comment - Thanks a lot for the quick response! Here is a new patch with the requested changes. I added a method setOffsetGap(..) and getOffsetGap() to MockAnalyzer and now set the offsetGap random in the test, which sort of tests the gapOffset as well. I did not add a separate test for the offsetGap though. Can you hint at what exactly I should be testing? Also, I found that the String parameter for MockAnalyzer.setPositionIncrementGap is not really used. I added it to setOffsetGap for consistency but it makes no sense. Should it be removed? I close everything now that has a close method. I am now a little surprised that the missing closes caused no test failure in the original patch. Could this be a problem?
        Hide
        Simon Willnauer added a comment - - edited

        Hey Britta,

        the parameter should be offsetGap rather than positionIncrementGap no?

         public void setOffsetGap(int positionIncrementGap){
        

        Can you hint at what exactly I should be testing?

        you are passing an random offset (btw. the Math.abs call is not needed since it should give you a positive int anyways) to the mock analyzer that is what I meant by testing the offset gap.

        please just fix that parameter name and I think we are good to go! thanks for fixing all the stuff so quickly!

        Show
        Simon Willnauer added a comment - - edited Hey Britta, the parameter should be offsetGap rather than positionIncrementGap no? public void setOffsetGap(int positionIncrementGap){ Can you hint at what exactly I should be testing? you are passing an random offset (btw. the Math.abs call is not needed since it should give you a positive int anyways) to the mock analyzer that is what I meant by testing the offset gap. please just fix that parameter name and I think we are good to go! thanks for fixing all the stuff so quickly!
        Hide
        britta weber added a comment -

        Here you go, also added javadoc to MockAnalyzer.set/getOffsetGap.

        Show
        britta weber added a comment - Here you go, also added javadoc to MockAnalyzer.set/getOffsetGap.
        Hide
        Simon Willnauer added a comment -

        looks awesome - I will commit this in a bit!

        Show
        Simon Willnauer added a comment - looks awesome - I will commit this in a bit!
        Hide
        Uwe Schindler added a comment -

        +1 to commit. Looks correct. I would just prefer to explicitely intialize fields (although they are 0) in MockAnalyzer.

        Show
        Uwe Schindler added a comment - +1 to commit. Looks correct. I would just prefer to explicitely intialize fields (although they are 0) in MockAnalyzer.
        Hide
        Simon Willnauer added a comment -

        I ran the tests and found some bugs - the actual value to initialize the offsetGap is 1 rather than 0 - I also fixed the MockAnalyzer to pull the super.getOffsetGap() if it is not set so it will always get the default we use. I also added a changes entry and will commit in a bit.

        Show
        Simon Willnauer added a comment - I ran the tests and found some bugs - the actual value to initialize the offsetGap is 1 rather than 0 - I also fixed the MockAnalyzer to pull the super.getOffsetGap() if it is not set so it will always get the default we use. I also added a changes entry and will commit in a bit.
        Hide
        ASF subversion and git services added a comment -

        Commit 1568294 from Simon Willnauer in branch 'dev/trunk'
        [ https://svn.apache.org/r1568294 ]

        LUCENE-5444: MemoryIndex did't respect the analyzers offset gap

        Show
        ASF subversion and git services added a comment - Commit 1568294 from Simon Willnauer in branch 'dev/trunk' [ https://svn.apache.org/r1568294 ] LUCENE-5444 : MemoryIndex did't respect the analyzers offset gap
        Hide
        ASF subversion and git services added a comment -

        Commit 1568314 from Simon Willnauer in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1568314 ]

        LUCENE-5444: MemoryIndex did't respect the analyzers offset gap

        Show
        ASF subversion and git services added a comment - Commit 1568314 from Simon Willnauer in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1568314 ] LUCENE-5444 : MemoryIndex did't respect the analyzers offset gap
        Hide
        Simon Willnauer added a comment -

        committed to trunk and 4.x

        Show
        Simon Willnauer added a comment - committed to trunk and 4.x
        Hide
        britta weber added a comment -

        Thanks a lot Simon and Uwe for doing this so quickly!

        Show
        britta weber added a comment - Thanks a lot Simon and Uwe for doing this so quickly!

          People

          • Assignee:
            Simon Willnauer
            Reporter:
            britta weber
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 168h
              168h
              Remaining:
              Remaining Estimate - 168h
              168h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development