Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7491

Unexpected merge exception when merging sparse points fields

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0, 6.3
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Spinoff from this user thread: http://markmail.org/thread/vwdvjgupyz6heep5

      If you have a segment that has points, but a given field ("id") didn't index points, and a later segment where field "id" does index points, when try to merge those segments we hit this (incorrect!) exception:

      Caused by: org.apache.lucene.index.MergePolicy$MergeException: java.lang.IllegalArgumentException: field="id" did not index point values
      	at __randomizedtesting.SeedInfo.seed([9F3E7B030EF482BD]:0)
      	at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:668)
      	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:648)
      Caused by: java.lang.IllegalArgumentException: field="id" did not index point values
      	at org.apache.lucene.codecs.lucene60.Lucene60PointsReader.getBKDReader(Lucene60PointsReader.java:126)
      	at org.apache.lucene.codecs.lucene60.Lucene60PointsReader.size(Lucene60PointsReader.java:224)
      	at org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.merge(Lucene60PointsWriter.java:169)
      	at org.apache.lucene.index.SegmentMerger.mergePoints(SegmentMerger.java:173)
      	at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:122)
      	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4287)
      	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3864)
      	at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
      	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
      
      1. LUCENE-7491.patch
        6 kB
        Michael McCandless
      2. LUCENE-7491.patch
        1 kB
        Michael McCandless

        Activity

        Hide
        mikemccand Michael McCandless added a comment -

        Patch w/ test case showing the issue ... I'll work on a fix next.

        Show
        mikemccand Michael McCandless added a comment - Patch w/ test case showing the issue ... I'll work on a fix next.
        Hide
        jpountz Adrien Grand added a comment -

        To make things less trappy, I'm wondering that maybe LeafReader.getPointValues() should never return null. Otherwise the code gets into different branches depending on whether other fields index points or not.

        Show
        jpountz Adrien Grand added a comment - To make things less trappy, I'm wondering that maybe LeafReader.getPointValues() should never return null . Otherwise the code gets into different branches depending on whether other fields index points or not.
        Hide
        mikemccand Michael McCandless added a comment -

        Patch w/ the fix ... the problem was that the merge logic was assuming just because one segment had points for a given field that all segments must have points for that field, which is clearly not the case!

        Show
        mikemccand Michael McCandless added a comment - Patch w/ the fix ... the problem was that the merge logic was assuming just because one segment had points for a given field that all segments must have points for that field, which is clearly not the case!
        Hide
        mikemccand Michael McCandless added a comment -

        To make things less trappy, I'm wondering that maybe LeafReader.getPointValues() should never return null.

        We could maybe separately consider that? I don't think that would have prevented this particular bug.

        I'm generally not really a fan of returning fake empty "should be null but caller can't be trusted" objects though: I think it's a degree of API leniency that if you take it to its limit, never ends, i.e. how deeply do you keep returning null as you dig into the fake object? These are quite expert APIs and I think it's reasonable to expect the caller to be careful with the return result...

        Today, a null return from LeafReader.getPointValues is meaningful: it notifies you this segment has no points indexed at all. We would be hiding that information if instead we returned a fake empty object.

        Not helping matters, I do realize we are inconsistent here, e.g. LeafReader.fields() is not null even if there were no postings in that segment, yet Fields.terms(String field) is null if the postings didn't have that field.

        Show
        mikemccand Michael McCandless added a comment - To make things less trappy, I'm wondering that maybe LeafReader.getPointValues() should never return null. We could maybe separately consider that? I don't think that would have prevented this particular bug. I'm generally not really a fan of returning fake empty "should be null but caller can't be trusted" objects though: I think it's a degree of API leniency that if you take it to its limit, never ends, i.e. how deeply do you keep returning null as you dig into the fake object? These are quite expert APIs and I think it's reasonable to expect the caller to be careful with the return result... Today, a null return from LeafReader.getPointValues is meaningful: it notifies you this segment has no points indexed at all. We would be hiding that information if instead we returned a fake empty object. Not helping matters, I do realize we are inconsistent here, e.g. LeafReader.fields() is not null even if there were no postings in that segment, yet Fields.terms(String field) is null if the postings didn't have that field.
        Hide
        jpountz Adrien Grand added a comment -

        I'm generally not really a fan of returning fake empty "should be null but caller can't be trusted" objects though

        I don't disagree with this statement but I like the current situation even less. It makes things hard to test because of the branches it creates. Say you want to test the point range query on field foo, you need to test what happens when no fields have points, when foo has points and when foo does not have points but other fields from the same segment do. If you don't like returning non-null even when no fields have points, then maybe we should consider making points work per field like doc values, so instead of having LeafReader.getPointValues() and all methods of PointValues that take a String fieldName parameter, we could have LeafReader.getPointValues(String fieldName) and remove all String fieldName parameters from PointValues?

        Show
        jpountz Adrien Grand added a comment - I'm generally not really a fan of returning fake empty "should be null but caller can't be trusted" objects though I don't disagree with this statement but I like the current situation even less. It makes things hard to test because of the branches it creates. Say you want to test the point range query on field foo , you need to test what happens when no fields have points, when foo has points and when foo does not have points but other fields from the same segment do. If you don't like returning non-null even when no fields have points, then maybe we should consider making points work per field like doc values, so instead of having LeafReader.getPointValues() and all methods of PointValues that take a String fieldName parameter, we could have LeafReader.getPointValues(String fieldName) and remove all String fieldName parameters from PointValues ?
        Hide
        mikemccand Michael McCandless added a comment -

        we could have LeafReader.getPointValues(String fieldName) and remove all String fieldName parameters from PointValues?

        I think that's compelling! Maybe open a new issue?

        But would this API return null if that field did not index points in that segment?

        Show
        mikemccand Michael McCandless added a comment - we could have LeafReader.getPointValues(String fieldName) and remove all String fieldName parameters from PointValues? I think that's compelling! Maybe open a new issue? But would this API return null if that field did not index points in that segment?
        Hide
        jpountz Adrien Grand added a comment -

        But would this API return null if that field did not index points in that segment?

        Yes, like doc values?

        Show
        jpountz Adrien Grand added a comment - But would this API return null if that field did not index points in that segment? Yes, like doc values?
        Hide
        mikemccand Michael McCandless added a comment -

        +1

        Show
        mikemccand Michael McCandless added a comment - +1
        Hide
        jpountz Adrien Grand added a comment -

        +1 to the patch

        Show
        jpountz Adrien Grand added a comment - +1 to the patch
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1b7a88f61ea44ecc873d7c7d135ce5c6ab88bb0a in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1b7a88f ]

        LUCENE-7491: fix merge exception if the same field has points in some segments but not in others

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1b7a88f61ea44ecc873d7c7d135ce5c6ab88bb0a in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1b7a88f ] LUCENE-7491 : fix merge exception if the same field has points in some segments but not in others
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 86b03358d59c584c89823e187b8806da48eb82af in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=86b0335 ]

        LUCENE-7491: fix merge exception if the same field has points in some segments but not in others

        Show
        jira-bot ASF subversion and git services added a comment - Commit 86b03358d59c584c89823e187b8806da48eb82af in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=86b0335 ] LUCENE-7491 : fix merge exception if the same field has points in some segments but not in others
        Hide
        mikemccand Michael McCandless added a comment -

        Thanks Adrien Grand.

        Show
        mikemccand Michael McCandless added a comment - Thanks Adrien Grand .
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1a0786332d16eaefde013a4e075658387a3ae382 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1a07863 ]

        LUCENE-7491: also allow for merging segment with points before one without points for a given field (thank you MockRandomMergePolicy!)

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1a0786332d16eaefde013a4e075658387a3ae382 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1a07863 ] LUCENE-7491 : also allow for merging segment with points before one without points for a given field (thank you MockRandomMergePolicy!)
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit f4d3ca89802d06087780d7b7a1e7516c6b0f94c7 in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f4d3ca8 ]

        LUCENE-7491: also allow for merging segment with points before one without points for a given field (thank you MockRandomMergePolicy!)

        Show
        jira-bot ASF subversion and git services added a comment - Commit f4d3ca89802d06087780d7b7a1e7516c6b0f94c7 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f4d3ca8 ] LUCENE-7491 : also allow for merging segment with points before one without points for a given field (thank you MockRandomMergePolicy!)
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        Closing after 6.3.0 release.

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - Closing after 6.3.0 release.

          People

          • Assignee:
            mikemccand Michael McCandless
            Reporter:
            mikemccand Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development