Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7293

Highlighting with GeoPointInBBoxQuery can result in Exception

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 5.5, 6.0, 7.0
    • Fix Version/s: 6.1, 6.0.1, 7.0
    • Component/s: modules/highlighter
    • Labels:
      None

      Description

      Highlighter and GeoPointInBBoxQuery don't play well together. I wrote a test here that throws an exception which I think it should not:

      https://github.com/brwe/lucene-solr/commit/311f5527ffb6f3ef50e3f74b54456aa7b29d8898

      The problem seems to be that GeoPointInBBoxQuery calls rewrite with a reader that contains a text field (which we want to highlight) and therefore has the wrong encoding.

      This is from an elasticsearch issue: https://github.com/elastic/elasticsearch/issues/17537

      1. LUCENE-7293.patch
        8 kB
        Michael McCandless
      2. LUCENE-7293.patch
        7 kB
        Michael McCandless
      3. LUCENE-7293-revert+fix.patch
        9 kB
        Uwe Schindler
      4. LUCENE-7293-revert+fix.patch
        9 kB
        Uwe Schindler

        Activity

        Hide
        mikemccand Michael McCandless added a comment -

        Thanks britta weber! This looks like a real bug ... here's a patch showing the test failure:

            [mkdir] Created dir: /l/60/lucene/build/highlighter/test
        [junit4:pickseed] Seed property 'tests.seed' already defined: 26897D04349F7E78
            [mkdir] Created dir: /l/60/lucene/build/highlighter/test/temp
           [junit4] <JUnit4> says ahoj! Master seed: 26897D04349F7E78
           [junit4] Executing 1 suite with 1 JVM.
           [junit4] 
           [junit4] Started J0 PID(5385@localhost).
           [junit4] Suite: org.apache.lucene.search.highlight.HighlighterTest
           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=HighlighterTest -Dtests.method=testGeoPointQueryHighlight -Dtests.seed=26897D04349F7E78 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=ar-OM -Dtests.timezone=Europe/Bratislava -Dtests.asserts=true -Dtests.file.encoding=UTF-8
           [junit4] ERROR   0.34s | HighlighterTest.testGeoPointQueryHighlight <<<
           [junit4]    > Throwable #1: java.lang.NumberFormatException: Invalid shift value (110) in prefixCoded bytes (is encoded value really a geo point?)
           [junit4]    > 	at __randomizedtesting.SeedInfo.seed([26897D04349F7E78:86177FE84F9CD7A0]:0)
           [junit4]    > 	at org.apache.lucene.spatial.util.GeoEncodingUtils.getPrefixCodedShift(GeoEncodingUtils.java:135)
           [junit4]    > 	at org.apache.lucene.spatial.geopoint.search.GeoPointPrefixTermsEnum.accept(GeoPointPrefixTermsEnum.java:219)
           [junit4]    > 	at org.apache.lucene.index.FilteredTermsEnum.next(FilteredTermsEnum.java:232)
           [junit4]    > 	at org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:67)
           [junit4]    > 	at org.apache.lucene.search.ScoringRewrite.rewrite(ScoringRewrite.java:108)
           [junit4]    > 	at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:222)
           [junit4]    > 	at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:229)
           [junit4]    > 	at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:112)
           [junit4]    > 	at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:213)
           [junit4]    > 	at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:508)
           [junit4]    > 	at org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:218)
           [junit4]    > 	at org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:186)
           [junit4]    > 	at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:195)
           [junit4]    > 	at org.apache.lucene.search.highlight.Highlighter.getBestFragments(Highlighter.java:155)
           [junit4]    > 	at org.apache.lucene.search.highlight.Highlighter.getBestFragment(Highlighter.java:101)
           [junit4]    > 	at org.apache.lucene.search.highlight.HighlighterTest.testGeoPointQueryHighlight(HighlighterTest.java:214)
           [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
           [junit4]   2> NOTE: test params are: codec=Asserting(Lucene60): {contents=PostingsFormat(name=LuceneFixedGap)}, docValues:{}, maxPointsInLeafNode=1152, maxMBSortInHeap=5.611746652077259, sim=ClassicSimilarity, locale=ar-OM, timezone=Europe/Bratislava
           [junit4]   2> NOTE: Linux 3.13.0-86-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=393936840,total=504889344
           [junit4]   2> NOTE: All tests run in this JVM: [HighlighterTest]
           [junit4] Completed [1/1 (1!)] in 0.54s, 1 test, 1 error <<< FAILURES!
        
        Show
        mikemccand Michael McCandless added a comment - Thanks britta weber ! This looks like a real bug ... here's a patch showing the test failure: [mkdir] Created dir: /l/60/lucene/build/highlighter/test [junit4:pickseed] Seed property 'tests.seed' already defined: 26897D04349F7E78 [mkdir] Created dir: /l/60/lucene/build/highlighter/test/temp [junit4] <JUnit4> says ahoj! Master seed: 26897D04349F7E78 [junit4] Executing 1 suite with 1 JVM. [junit4] [junit4] Started J0 PID(5385@localhost). [junit4] Suite: org.apache.lucene.search.highlight.HighlighterTest [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=HighlighterTest -Dtests.method=testGeoPointQueryHighlight -Dtests.seed=26897D04349F7E78 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=ar-OM -Dtests.timezone=Europe/Bratislava -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 0.34s | HighlighterTest.testGeoPointQueryHighlight <<< [junit4] > Throwable #1: java.lang.NumberFormatException: Invalid shift value (110) in prefixCoded bytes (is encoded value really a geo point?) [junit4] > at __randomizedtesting.SeedInfo.seed([26897D04349F7E78:86177FE84F9CD7A0]:0) [junit4] > at org.apache.lucene.spatial.util.GeoEncodingUtils.getPrefixCodedShift(GeoEncodingUtils.java:135) [junit4] > at org.apache.lucene.spatial.geopoint.search.GeoPointPrefixTermsEnum.accept(GeoPointPrefixTermsEnum.java:219) [junit4] > at org.apache.lucene.index.FilteredTermsEnum.next(FilteredTermsEnum.java:232) [junit4] > at org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:67) [junit4] > at org.apache.lucene.search.ScoringRewrite.rewrite(ScoringRewrite.java:108) [junit4] > at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:222) [junit4] > at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:229) [junit4] > at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:112) [junit4] > at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:213) [junit4] > at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:508) [junit4] > at org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:218) [junit4] > at org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:186) [junit4] > at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:195) [junit4] > at org.apache.lucene.search.highlight.Highlighter.getBestFragments(Highlighter.java:155) [junit4] > at org.apache.lucene.search.highlight.Highlighter.getBestFragment(Highlighter.java:101) [junit4] > at org.apache.lucene.search.highlight.HighlighterTest.testGeoPointQueryHighlight(HighlighterTest.java:214) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: test params are: codec=Asserting(Lucene60): {contents=PostingsFormat(name=LuceneFixedGap)}, docValues:{}, maxPointsInLeafNode=1152, maxMBSortInHeap=5.611746652077259, sim=ClassicSimilarity, locale=ar-OM, timezone=Europe/Bratislava [junit4] 2> NOTE: Linux 3.13.0-86-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=393936840,total=504889344 [junit4] 2> NOTE: All tests run in this JVM: [HighlighterTest] [junit4] Completed [1/1 (1!)] in 0.54s, 1 test, 1 error <<< FAILURES!
        Hide
        mikemccand Michael McCandless added a comment -

        The problem here is highlighter is "lying" to GeoPointInBBoxQuery by passing it textual terms that did not in fact come from a geopoint encoded field.

        I think it's fair game for the query to throw an exception in this case! In normal usage, e.g. if a user does this, it's a bad error case.

        I think to fix this we should change WeightedSpanTermExtractor.extract to skip all geopoint queries?

        Show
        mikemccand Michael McCandless added a comment - The problem here is highlighter is "lying" to GeoPointInBBoxQuery by passing it textual terms that did not in fact come from a geopoint encoded field. I think it's fair game for the query to throw an exception in this case! In normal usage, e.g. if a user does this, it's a bad error case. I think to fix this we should change WeightedSpanTermExtractor.extract to skip all geopoint queries?
        Hide
        dsmiley David Smiley added a comment -

        I think to fix this we should change WeightedSpanTermExtractor.extract to skip all geopoint queries?

        +1

        All those explicit query instanceof checks in there is a code smell... Not to be resolved in this issue certainly. The other 2 highlighters also have large query instanceof checks as well but they aren't impacted in the same way as each other. Maybe a marker interface would be useful.

        Show
        dsmiley David Smiley added a comment - I think to fix this we should change WeightedSpanTermExtractor.extract to skip all geopoint queries? +1 All those explicit query instanceof checks in there is a code smell... Not to be resolved in this issue certainly. The other 2 highlighters also have large query instanceof checks as well but they aren't impacted in the same way as each other. Maybe a marker interface would be useful.
        Hide
        mikemccand Michael McCandless added a comment -

        New patch, fixing the bug by avoiding GeoPointInBBoxQuery.

        I think we should fix this for 6.0.1 too.

        Show
        mikemccand Michael McCandless added a comment - New patch, fixing the bug by avoiding GeoPointInBBoxQuery . I think we should fix this for 6.0.1 too.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 7d0d500bccf81e407c7978c35fad6b12e6157ff1 in lucene-solr's branch refs/heads/branch_6_0 from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7d0d500 ]

        LUCENE-7293: don't try to highlight GeoPoint queries

        Show
        jira-bot ASF subversion and git services added a comment - Commit 7d0d500bccf81e407c7978c35fad6b12e6157ff1 in lucene-solr's branch refs/heads/branch_6_0 from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7d0d500 ] LUCENE-7293 : don't try to highlight GeoPoint queries
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 78e92a05365b6642f3c880b0fefaedf4102c6457 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=78e92a0 ]

        LUCENE-7293: don't try to highlight GeoPoint queries

        Show
        jira-bot ASF subversion and git services added a comment - Commit 78e92a05365b6642f3c880b0fefaedf4102c6457 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=78e92a0 ] LUCENE-7293 : don't try to highlight GeoPoint queries
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit aa90b88e2d214d7c999c2c678e92beaec9781100 in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=aa90b88 ]

        LUCENE-7293: don't try to highlight GeoPoint queries

        Show
        jira-bot ASF subversion and git services added a comment - Commit aa90b88e2d214d7c999c2c678e92beaec9781100 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=aa90b88 ] LUCENE-7293 : don't try to highlight GeoPoint queries
        Hide
        mikemccand Michael McCandless added a comment -

        Thanks britta weber!

        Show
        mikemccand Michael McCandless added a comment - Thanks britta weber !
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit b09a5c80c677f6b8d61fd0dac5bf5caf98aad729 in lucene-solr's branch refs/heads/master from Uwe Schindler
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b09a5c8 ]

        LUCENE-7293: Fix benchmark dependencies

        Show
        jira-bot ASF subversion and git services added a comment - Commit b09a5c80c677f6b8d61fd0dac5bf5caf98aad729 in lucene-solr's branch refs/heads/master from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b09a5c8 ] LUCENE-7293 : Fix benchmark dependencies
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 9e793672e4999eca051967305a8b29db51b952cd in lucene-solr's branch refs/heads/branch_6x from Uwe Schindler
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9e79367 ]

        LUCENE-7293: Fix benchmark dependencies

        Show
        jira-bot ASF subversion and git services added a comment - Commit 9e793672e4999eca051967305a8b29db51b952cd in lucene-solr's branch refs/heads/branch_6x from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9e79367 ] LUCENE-7293 : Fix benchmark dependencies
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 6cee1aa8e1e197f71388a84c4da2f5864f79693a in lucene-solr's branch refs/heads/branch_6_0 from Uwe Schindler
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6cee1aa ]

        LUCENE-7293: Fix benchmark dependencies

        Show
        jira-bot ASF subversion and git services added a comment - Commit 6cee1aa8e1e197f71388a84c4da2f5864f79693a in lucene-solr's branch refs/heads/branch_6_0 from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6cee1aa ] LUCENE-7293 : Fix benchmark dependencies
        Hide
        thetaphi Uwe Schindler added a comment -

        Hi,
        I reopening this, because I have a problem in let the whole highlighting module depend on the spatial module, just to check that something is not a query.

        I'd suggest to do a more dynamic check (e.g. compare string class name or similar). If we also need to match subclasses, I'd suggest to try to Class.forName the spatial class and use Class.isAssignableFrom, if found.

        If you revert and fix the issue, please also don't forget to revert the benchmark module's changes.

        Show
        thetaphi Uwe Schindler added a comment - Hi, I reopening this, because I have a problem in let the whole highlighting module depend on the spatial module, just to check that something is not a query. I'd suggest to do a more dynamic check (e.g. compare string class name or similar). If we also need to match subclasses, I'd suggest to try to Class.forName the spatial class and use Class.isAssignableFrom, if found. If you revert and fix the issue, please also don't forget to revert the benchmark module's changes.
        Hide
        thetaphi Uwe Schindler added a comment - - edited

        In addition the new dependency was not correctly added to the Maven POM + you changed dependnecies in a minor version upgrade - this is a no-go. So please, please - revert this and fix in a dyanmic way without a hard dependency just to check that something is not supported.

        Show
        thetaphi Uwe Schindler added a comment - - edited In addition the new dependency was not correctly added to the Maven POM + you changed dependnecies in a minor version upgrade - this is a no-go. So please, please - revert this and fix in a dyanmic way without a hard dependency just to check that something is not supported.
        Hide
        dsmiley David Smiley added a comment -

        +1 Uwe; good catch.

        Show
        dsmiley David Smiley added a comment - +1 Uwe; good catch.
        Hide
        thetaphi Uwe Schindler added a comment -

        Here is my proposal (revert + new fix).

        Show
        thetaphi Uwe Schindler added a comment - Here is my proposal (revert + new fix).
        Hide
        thetaphi Uwe Schindler added a comment -

        An alternative would be to just check the package name and exclude all spatial queries from highlighting. This is less fragile.

        Show
        thetaphi Uwe Schindler added a comment - An alternative would be to just check the package name and exclude all spatial queries from highlighting. This is less fragile.
        Hide
        mikemccand Michael McCandless added a comment -

        +1 Uwe Schindler, please push that!!

        Show
        mikemccand Michael McCandless added a comment - +1 Uwe Schindler , please push that!!
        Hide
        thetaphi Uwe Schindler added a comment -

        Would you prefer a less-fragile version that just excludes all queries where the package begins with "org.apache.lucene.spatial."? Do we have support for any geo query at all?

        Show
        thetaphi Uwe Schindler added a comment - Would you prefer a less-fragile version that just excludes all queries where the package begins with " org.apache.lucene.spatial. "? Do we have support for any geo query at all?
        Hide
        mikemccand Michael McCandless added a comment -

        Would you prefer a less-fragile version

        +1

        Show
        mikemccand Michael McCandless added a comment - Would you prefer a less-fragile version +1
        Hide
        thetaphi Uwe Schindler added a comment - - edited

        New patch that disables all spatial and spatial3d queries from highlighting.

        Show
        thetaphi Uwe Schindler added a comment - - edited New patch that disables all spatial and spatial3d queries from highlighting.
        Hide
        thetaphi Uwe Schindler added a comment -

        (just noticed: Solr also failed in same way)

        Show
        thetaphi Uwe Schindler added a comment - (just noticed: Solr also failed in same way)
        Hide
        mikemccand Michael McCandless added a comment -

        Ugh, woops, sorry for the noise Thanks for fixing Uwe Schindler!

        Show
        mikemccand Michael McCandless added a comment - Ugh, woops, sorry for the noise Thanks for fixing Uwe Schindler !
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit eecab95aa6199332aeb54f2e013d2341396944df in lucene-solr's branch refs/heads/master from Uwe Schindler
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=eecab95 ]

        Revert and fix "LUCENE-7293: don't try to highlight GeoPoint queries"

        Show
        jira-bot ASF subversion and git services added a comment - Commit eecab95aa6199332aeb54f2e013d2341396944df in lucene-solr's branch refs/heads/master from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=eecab95 ] Revert and fix " LUCENE-7293 : don't try to highlight GeoPoint queries"
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 25e913211b9c4b28448c0db699d2f6df77fdfcf5 in lucene-solr's branch refs/heads/branch_6x from Uwe Schindler
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=25e9132 ]

        Revert and fix "LUCENE-7293: don't try to highlight GeoPoint queries"

        Show
        jira-bot ASF subversion and git services added a comment - Commit 25e913211b9c4b28448c0db699d2f6df77fdfcf5 in lucene-solr's branch refs/heads/branch_6x from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=25e9132 ] Revert and fix " LUCENE-7293 : don't try to highlight GeoPoint queries"
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 370976035c6e39db3f086e9f741dfd0be92be5a8 in lucene-solr's branch refs/heads/branch_6_0 from Uwe Schindler
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3709760 ]

        Revert and fix "LUCENE-7293: don't try to highlight GeoPoint queries"

        Show
        jira-bot ASF subversion and git services added a comment - Commit 370976035c6e39db3f086e9f741dfd0be92be5a8 in lucene-solr's branch refs/heads/branch_6_0 from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3709760 ] Revert and fix " LUCENE-7293 : don't try to highlight GeoPoint queries"
        Hide
        steve_rowe Steve Rowe added a comment -

        Bulk close issues included in the 6.0.1 release.

        Show
        steve_rowe Steve Rowe added a comment - Bulk close issues included in the 6.0.1 release.

          People

          • Assignee:
            thetaphi Uwe Schindler
            Reporter:
            a2tirb britta weber
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development