Lucene - Core
  1. Lucene - Core
  2. LUCENE-5128

Calling IndexSearcher.searchAfter beyond the number of stored documents causes ArrayIndexOutOfBoundsException

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.2
    • Fix Version/s: 4.5, 6.0
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      ArrayIndexOutOfBoundsException makes it harder to reason about the cause.

      Is there a better way to notify programmers of the cause?

      1. LUCENE-5128.patch
        6 kB
        Shai Erera
      2. LUCENE-5128.patch
        3 kB
        Shai Erera

        Activity

        Hide
        Shai Erera added a comment -

        I made this trivial change and added a test. However, TestShardSearching.testSimple fails if run with -Dtests.seed=3626FF5A675BE28. What happens is that the shard sets after=Integer.MAX_VALUE to denote no more docs should be collected from that shard.

        Now, this is just test code and I'm sure there's a better way to handle it (i.e. why run search at all against that shard if no docs should be collected?), but I checked TopScoreDocCollector and TopFieldCollector to understand where would they throw AIOOBE and I couldn't find the place. Looks like they just ignore any doc passed to collect(). Could you please paste the stacktrace? Perhaps it's a different collector that we need to fix?

        I will also check if it's possible to fix test shards to not search this shard if no docs should be collected

        Show
        Shai Erera added a comment - I made this trivial change and added a test. However, TestShardSearching.testSimple fails if run with -Dtests.seed=3626FF5A675BE28. What happens is that the shard sets after=Integer.MAX_VALUE to denote no more docs should be collected from that shard. Now, this is just test code and I'm sure there's a better way to handle it (i.e. why run search at all against that shard if no docs should be collected?), but I checked TopScoreDocCollector and TopFieldCollector to understand where would they throw AIOOBE and I couldn't find the place. Looks like they just ignore any doc passed to collect(). Could you please paste the stacktrace? Perhaps it's a different collector that we need to fix? I will also check if it's possible to fix test shards to not search this shard if no docs should be collected
        Hide
        Shai Erera added a comment -

        So, while not sending a query to a shard which will return no results is logically ok, there is an issue with it. TopDocs.merge merges TopDocs[] and part of the process is to compute a totalHitsCount, so that the merged TopDocs.totalHits is the sum of all shards' totalHits. Now, imagine that we didn't submit a query to a shard with no more results, then what would happen is that the previous search would return mergedTotalHits=N and the follow-on search (e.g. clicking 'next') would return mergedTotalHits=N-K, where K is the totalHits from the shard that we omitted.

        Fixing the test is not easy unless we cache somewhere the totalHits from the previous round. And I assume an app will need to do that too if it wants to omit a shard from a search. But this looks unrelated to this issue, i.e. it's some sort of optimization that we need to put somewhere, maybe searchAfter should also record totalHits for that shard, I'm not sure. But it's unrelated to this issue.

        So I think at this point it would be better to fix whatever collector that is sensitive to searchAfter exceeding maxDoc, than to fix IndexSearcher to fail fast.

        Show
        Shai Erera added a comment - So, while not sending a query to a shard which will return no results is logically ok, there is an issue with it. TopDocs.merge merges TopDocs[] and part of the process is to compute a totalHitsCount, so that the merged TopDocs.totalHits is the sum of all shards' totalHits. Now, imagine that we didn't submit a query to a shard with no more results, then what would happen is that the previous search would return mergedTotalHits=N and the follow-on search (e.g. clicking 'next') would return mergedTotalHits=N-K, where K is the totalHits from the shard that we omitted. Fixing the test is not easy unless we cache somewhere the totalHits from the previous round. And I assume an app will need to do that too if it wants to omit a shard from a search. But this looks unrelated to this issue, i.e. it's some sort of optimization that we need to put somewhere, maybe searchAfter should also record totalHits for that shard, I'm not sure. But it's unrelated to this issue. So I think at this point it would be better to fix whatever collector that is sensitive to searchAfter exceeding maxDoc, than to fix IndexSearcher to fail fast.
        Hide
        Michael McCandless added a comment -

        I think the core change makes sense. If you pass a too-big doc to searchAfter, something is buggy with the app.

        And we should fix TestShardSearching to not abuse the API like that ... it should do its own tracking.

        But, it'd be good to see the actual stacktrace here first ...

        Show
        Michael McCandless added a comment - I think the core change makes sense. If you pass a too-big doc to searchAfter, something is buggy with the app. And we should fix TestShardSearching to not abuse the API like that ... it should do its own tracking. But, it'd be good to see the actual stacktrace here first ...
        Hide
        Shai Erera added a comment -

        it should do its own tracking

        That's the part that bothers me. If we don't allow passing such searchAfter, we're basically telling apps that they should keep track of their shards, which means they need to maintain state (not that it's hard to maintain in this case). I'd rather first see the exception because from what I checked both TopScoreDocsCollector and TopFieldsCollector should be fine with searchAfter >= reader.maxDoc(), and IndexSearch.searchAfter returns a TopDocs so there aren't many collectors which can be affected. Except maybe if crocket uses DrillSideways...

        On one hand the core fix makes sense, on the other hand it may unnecessarily complicate apps life. Especially if they need to obtain totalHits ... I wonder if in that case they can run with TotalHitCountCollector instead and construct an empty TopDocs? That's what happens when you currently pass searchAfter=Integer.MAX_VALUE. I'll try to fix the test like that to see if it helps.

        Show
        Shai Erera added a comment - it should do its own tracking That's the part that bothers me. If we don't allow passing such searchAfter, we're basically telling apps that they should keep track of their shards, which means they need to maintain state (not that it's hard to maintain in this case). I'd rather first see the exception because from what I checked both TopScoreDocsCollector and TopFieldsCollector should be fine with searchAfter >= reader.maxDoc(), and IndexSearch.searchAfter returns a TopDocs so there aren't many collectors which can be affected. Except maybe if crocket uses DrillSideways... On one hand the core fix makes sense, on the other hand it may unnecessarily complicate apps life. Especially if they need to obtain totalHits ... I wonder if in that case they can run with TotalHitCountCollector instead and construct an empty TopDocs? That's what happens when you currently pass searchAfter=Integer.MAX_VALUE. I'll try to fix the test like that to see if it helps.
        Hide
        Shai Erera added a comment -

        I had a chat w/ Mike about what's going on in the test and turns out the test is fine, comments needed some improvements. So I fixed IndexSearcher to throw IllegalArgEx if searchAfter.doc >= maxDoc + fixed test comments (I think they better explain what's going on there – Mike, appreciate your review!) and set after.doc to maxDoc-1.

        I think it's ready. This all makes sense now (to me). Core tests pass. I'd appreciate some review, but I intend to commit this tomorrow.

        Show
        Shai Erera added a comment - I had a chat w/ Mike about what's going on in the test and turns out the test is fine, comments needed some improvements. So I fixed IndexSearcher to throw IllegalArgEx if searchAfter.doc >= maxDoc + fixed test comments (I think they better explain what's going on there – Mike, appreciate your review!) and set after.doc to maxDoc-1. I think it's ready. This all makes sense now (to me). Core tests pass. I'd appreciate some review, but I intend to commit this tomorrow.
        Hide
        Michael McCandless added a comment -

        +1, patch looks great. Thanks Shai!

        Show
        Michael McCandless added a comment - +1, patch looks great. Thanks Shai!
        Hide
        ASF subversion and git services added a comment -

        Commit 1505909 from Shai Erera in branch 'dev/trunk'
        [ https://svn.apache.org/r1505909 ]

        LUCENE-5128: IndexSearcher.searchAfter should throw IllegalArgumentException if after.doc >= reader.maxDoc()

        Show
        ASF subversion and git services added a comment - Commit 1505909 from Shai Erera in branch 'dev/trunk' [ https://svn.apache.org/r1505909 ] LUCENE-5128 : IndexSearcher.searchAfter should throw IllegalArgumentException if after.doc >= reader.maxDoc()
        Hide
        ASF subversion and git services added a comment -

        Commit 1505910 from Shai Erera in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1505910 ]

        LUCENE-5128: IndexSearcher.searchAfter should throw IllegalArgumentException if after.doc >= reader.maxDoc()

        Show
        ASF subversion and git services added a comment - Commit 1505910 from Shai Erera in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1505910 ] LUCENE-5128 : IndexSearcher.searchAfter should throw IllegalArgumentException if after.doc >= reader.maxDoc()
        Hide
        Shai Erera added a comment -

        Committed to trunk and 4x. Closing it now, crocket - still it would be good if you can paste the full stacktrace, so we can check if there are collectors that are sensitive to that.

        Show
        Shai Erera added a comment - Committed to trunk and 4x. Closing it now, crocket - still it would be good if you can paste the full stacktrace, so we can check if there are collectors that are sensitive to that.
        Hide
        crocket added a comment -

        Wait until this weekend. I'm going to check the stacktrace this saturday.

        Show
        crocket added a comment - Wait until this weekend. I'm going to check the stacktrace this saturday.
        Hide
        crocket added a comment - - edited

        Man, I was so foolish.

        It was my mistake.

        TopDocs searchResult=searcher.search(query, hitsPerPage, sort);
        for(Integer p=1; p<page; ++p) {
            searchResult=searcher.searchAfter(
                    searchResult.scoreDocs[searchResult.scoreDocs.length-1],
                    query, hitsPerPage, sort);
            System.out.println(searchResult.scoreDocs.length);
        }

        In the above code snippet, searchResult.scoreDocs.length becomes 0 when I search beyond the number of documents.
        But my code executes searchResult.scoreDocs[searchResult.scoreDocs.length-1] even after searchResult.scoreDocs.length becomes 0.

        That's why I faced that exception.

        Sorry for that.

        if I reported a false issue, what issue did you solve, then?

        Show
        crocket added a comment - - edited Man, I was so foolish. It was my mistake. TopDocs searchResult=searcher.search(query, hitsPerPage, sort); for(Integer p=1; p<page; ++p) { searchResult=searcher.searchAfter( searchResult.scoreDocs[searchResult.scoreDocs.length-1], query, hitsPerPage, sort); System.out.println(searchResult.scoreDocs.length); } In the above code snippet, searchResult.scoreDocs.length becomes 0 when I search beyond the number of documents. But my code executes searchResult.scoreDocs [searchResult.scoreDocs.length-1] even after searchResult.scoreDocs.length becomes 0. That's why I faced that exception. Sorry for that. if I reported a false issue, what issue did you solve, then?
        Hide
        Shai Erera added a comment -

        Hehe, no problem. Your report uncovered a potential misuse of the searchAfter API. You should always pass the last ScoreDoc that you obtained from the last search. Your code above uses it correctly, although you should check for scoreDocs.length == 0, as you noted.

        After enforcing that limitation in IndexSearcher, one test misused the API and had to be fixed as well.

        So your report, while a false alarm, led to an improvement of the API . Thank you for that!

        Show
        Shai Erera added a comment - Hehe, no problem. Your report uncovered a potential misuse of the searchAfter API. You should always pass the last ScoreDoc that you obtained from the last search. Your code above uses it correctly, although you should check for scoreDocs.length == 0, as you noted. After enforcing that limitation in IndexSearcher, one test misused the API and had to be fixed as well. So your report, while a false alarm, led to an improvement of the API . Thank you for that!
        Hide
        Adrien Grand added a comment -

        4.5 release -> bulk close

        Show
        Adrien Grand added a comment - 4.5 release -> bulk close

          People

          • Assignee:
            Shai Erera
            Reporter:
            crocket
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development