Lucene - Core
  1. Lucene - Core
  2. LUCENE-6697

Use 1D KD tree for alternative to postings based numeric range filters

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Today Lucene uses postings to index a numeric value at multiple
      precision levels for fast range searching. It's somewhat costly: each
      numeric value is indexed with multiple terms (4 terms by default)
      ... I think a dedicated 1D BKD tree should be more compact and perform
      better.

      It should also easily generalize beyond 64 bits to arbitrary byte[],
      e.g. for LUCENE-5596, but I haven't explored that here.

      A 1D BKD tree just sorts all values, and then indexes adjacent leaf
      blocks of size 512-1024 (by default) values per block, and their
      docIDs, into a fully balanced binary tree. Building the range filter
      is then just a recursive walk through this tree.

      It's the same structure we use for 2D lat/lon BKD tree, just with 1D
      instead. I implemented it as a DocValuesFormat that also writes the
      numeric tree on the side.

      1. LUCENE-6697.patch
        116 kB
        Michael McCandless
      2. LUCENE-6697.patch
        119 kB
        Michael McCandless
      3. LUCENE-6697.patch
        95 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Work in progress patch, basically a fork of the 2D case.

        Show
        Michael McCandless added a comment - Work in progress patch, basically a fork of the 2D case.
        Hide
        Michael McCandless added a comment -

        For a benchmark I took the same ~60.8 M lat/lon points from
        OpenStreetMaps benchmark, but just indexed and searched the lat
        value. Benchmark source code is here: https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/IndexAndSearchOpenStreetMaps1D.java

        Using LongField/NumericRangeQuery (trunk):

        • 152 sec to index
        • 42 sec to optimize
        • 491 MB total index
        • 5.6 MB heap for terms index
        • 9.6 sec for 225 queries total 667,206,375 hits

        Using 1D BKD tree (patch):

        • 290 sec to index
        • 256 sec to optimize
        • 442 MB total index (204 MB doc values, 233 MB BKD tree)
        • 5.2 sec for 225 queries total 667,206,375 hits
        • 1.0 MB heap used for KD index

        BKD tree is quite a bit slower to index, especially merging since it fully
        re-builds the binary tree, and it's using offline sorting, but then is
        ~46% faster to search, and uses less heap for the in-memory index
        structure.

        Index size is a bit smaller, but it is also storing doc values so
        e.g. you could sort by this field as well vs. LongField which would
        have to also index doc values to enable sorting = larger index size.

        Show
        Michael McCandless added a comment - For a benchmark I took the same ~60.8 M lat/lon points from OpenStreetMaps benchmark, but just indexed and searched the lat value. Benchmark source code is here: https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/IndexAndSearchOpenStreetMaps1D.java Using LongField/NumericRangeQuery (trunk): 152 sec to index 42 sec to optimize 491 MB total index 5.6 MB heap for terms index 9.6 sec for 225 queries total 667,206,375 hits Using 1D BKD tree (patch): 290 sec to index 256 sec to optimize 442 MB total index (204 MB doc values, 233 MB BKD tree) 5.2 sec for 225 queries total 667,206,375 hits 1.0 MB heap used for KD index BKD tree is quite a bit slower to index, especially merging since it fully re-builds the binary tree, and it's using offline sorting, but then is ~46% faster to search, and uses less heap for the in-memory index structure. Index size is a bit smaller, but it is also storing doc values so e.g. you could sort by this field as well vs. LongField which would have to also index doc values to enable sorting = larger index size.
        Hide
        Michael McCandless added a comment -

        It should also easily generalize beyond 64 bits to arbitrary byte[],
        e.g. for LUCENE-5596, but I haven't explored that here.

        I started to explore this, since it's an easy "fork" for the 1D numeric case, but I was stopped dead in my tracks when I tried to add the doc values integration ... I'm trying to wrap SortedSetDocValues, and unfortunately the iterables passed to addSortedSetField don't let me quickly look up the byte[] for each doc ... it's like a I somehow need to pull a DocValuesProducer at write time ...

        Show
        Michael McCandless added a comment - It should also easily generalize beyond 64 bits to arbitrary byte[], e.g. for LUCENE-5596 , but I haven't explored that here. I started to explore this, since it's an easy "fork" for the 1D numeric case, but I was stopped dead in my tracks when I tried to add the doc values integration ... I'm trying to wrap SortedSetDocValues, and unfortunately the iterables passed to addSortedSetField don't let me quickly look up the byte[] for each doc ... it's like a I somehow need to pull a DocValuesProducer at write time ...
        Hide
        Michael McCandless added a comment -

        I was stopped dead in my tracks

        Hmm, actually, I think I should simply use the ords, not the byte[] values, and then it will work well!

        Show
        Michael McCandless added a comment - I was stopped dead in my tracks Hmm, actually, I think I should simply use the ords, not the byte[] values, and then it will work well!
        Hide
        Michael McCandless added a comment -

        Another iteration, I think it's ready. I renamed the thing to RangeTree.

        It turned out it was simple to support fast ranges over arbitrary byte[], by requiring SortedSetDVs and just indexing the long ord value into the BKD tree which is already designed to index long values.

        This means range tree should be a fast (at search time) way to support range filtering for values that need more than 64 bits, e.g. BigInteger, INET v6, etc. (LUCENE-5596).

        There are some things we could do to make indexing faster, e.g. a single sequential scan over all points to fill the binary tree index values, avoid the "copy over again to fixed width" step, etc., but I think we can do these later...

        Show
        Michael McCandless added a comment - Another iteration, I think it's ready. I renamed the thing to RangeTree. It turned out it was simple to support fast ranges over arbitrary byte[], by requiring SortedSetDVs and just indexing the long ord value into the BKD tree which is already designed to index long values. This means range tree should be a fast (at search time) way to support range filtering for values that need more than 64 bits, e.g. BigInteger, INET v6, etc. ( LUCENE-5596 ). There are some things we could do to make indexing faster, e.g. a single sequential scan over all points to fill the binary tree index values, avoid the "copy over again to fixed width" step, etc., but I think we can do these later...
        Hide
        Adrien Grand added a comment -

        I don't understand everything yet but what I understand looks good. There are a few variable names that look like they come from the geo impl (eg. latSorter in RangeTreeWriter), and maybe NumericRangeTreeQuery.createWeight could reuse ConstantScoreWeight/ConstantScoreScorer (like most of our other constant-score queries do) to avoid reimplementing explain/normalize/etc.?

        Show
        Adrien Grand added a comment - I don't understand everything yet but what I understand looks good. There are a few variable names that look like they come from the geo impl (eg. latSorter in RangeTreeWriter), and maybe NumericRangeTreeQuery.createWeight could reuse ConstantScoreWeight/ConstantScoreScorer (like most of our other constant-score queries do) to avoid reimplementing explain/normalize/etc.?
        Hide
        Michael McCandless added a comment -

        Thanks for the review Adrien Grand!

        There are a few variable names that look like they come from the geo impl (eg. latSorter in RangeTreeWriter), a

        Woops, I'll fix

        maybe NumericRangeTreeQuery.createWeight could reuse ConstantScoreWeight/ConstantScoreScorer (like most of our other constant-score queries do) to avoid reimplementing explain/normalize/etc.?

        Oh, good idea, I'll do that.

        Show
        Michael McCandless added a comment - Thanks for the review Adrien Grand ! There are a few variable names that look like they come from the geo impl (eg. latSorter in RangeTreeWriter), a Woops, I'll fix maybe NumericRangeTreeQuery.createWeight could reuse ConstantScoreWeight/ConstantScoreScorer (like most of our other constant-score queries do) to avoid reimplementing explain/normalize/etc.? Oh, good idea, I'll do that.
        Hide
        Michael McCandless added a comment -

        New patch, folding in Adrien Grand's feedback (thanks!).

        I also changed search-time to an iterative impl (vs recursive before), which I think simplifies it quite a bit. Now it just finds the range of sequential blocks that need to be visited, and sweeps those, doing per-hit filtering only on the leading and trailing blocks that may have documents outside the requested range.

        I re-ran the benchmark and it's no faster but the searhc-time code is much simpler.

        There are corresponding simplifications on the indexing side but I'd like to defer... I think this is a good addition already.

        Show
        Michael McCandless added a comment - New patch, folding in Adrien Grand 's feedback (thanks!). I also changed search-time to an iterative impl (vs recursive before), which I think simplifies it quite a bit. Now it just finds the range of sequential blocks that need to be visited, and sweeps those, doing per-hit filtering only on the leading and trailing blocks that may have documents outside the requested range. I re-ran the benchmark and it's no faster but the searhc-time code is much simpler. There are corresponding simplifications on the indexing side but I'd like to defer... I think this is a good addition already.
        Hide
        Michael McCandless added a comment -

        I also changed search-time to an iterative impl (vs recursive before)

        Actually once nice benefit of this is it's easy to estimate up front how many hits we'll have, so I can call grow up front. I'll add that and commit soon...

        Show
        Michael McCandless added a comment - I also changed search-time to an iterative impl (vs recursive before) Actually once nice benefit of this is it's easy to estimate up front how many hits we'll have, so I can call grow up front. I'll add that and commit soon...
        Hide
        Michael McCandless added a comment -

        it's easy to estimate up front how many hits we'll have, so I can call grow up front

        This gives a nice speedup: ~5.2 sec down to ~4.7 sec for the "225 bbox lats around London" benchmark.

        Show
        Michael McCandless added a comment - it's easy to estimate up front how many hits we'll have, so I can call grow up front This gives a nice speedup: ~5.2 sec down to ~4.7 sec for the "225 bbox lats around London" benchmark.
        Hide
        ASF subversion and git services added a comment -

        Commit 1693682 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1693682 ]

        LUCENE-6697: add range tree for fast numeric and binary range filtering

        Show
        ASF subversion and git services added a comment - Commit 1693682 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1693682 ] LUCENE-6697 : add range tree for fast numeric and binary range filtering
        Hide
        ASF subversion and git services added a comment -

        Commit 1693683 from Michael McCandless in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1693683 ]

        LUCENE-6697: add range tree for fast numeric and binary range filtering

        Show
        ASF subversion and git services added a comment - Commit 1693683 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1693683 ] LUCENE-6697 : add range tree for fast numeric and binary range filtering
        Hide
        ASF subversion and git services added a comment -

        Commit 1693693 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1693693 ]

        LUCENE-6697: rename NumericTree -> RangeTree

        Show
        ASF subversion and git services added a comment - Commit 1693693 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1693693 ] LUCENE-6697 : rename NumericTree -> RangeTree
        Hide
        ASF subversion and git services added a comment -

        Commit 1693694 from Michael McCandless in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1693694 ]

        LUCENE-6697: rename NumericTree -> RangeTree

        Show
        ASF subversion and git services added a comment - Commit 1693694 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1693694 ] LUCENE-6697 : rename NumericTree -> RangeTree
        Hide
        ASF subversion and git services added a comment -

        Commit 1693800 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1693800 ]

        LUCENE-6697: test bug: the stupid-simple-yet-hopefully-bug-free numeric range filter was buggy

        Show
        ASF subversion and git services added a comment - Commit 1693800 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1693800 ] LUCENE-6697 : test bug: the stupid-simple-yet-hopefully-bug-free numeric range filter was buggy
        Hide
        ASF subversion and git services added a comment -

        Commit 1693801 from Michael McCandless in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1693801 ]

        LUCENE-6697: test bug: the stupid-simple-yet-hopefully-bug-free numeric range filter was buggy

        Show
        ASF subversion and git services added a comment - Commit 1693801 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1693801 ] LUCENE-6697 : test bug: the stupid-simple-yet-hopefully-bug-free numeric range filter was buggy
        Hide
        Steve Rowe added a comment -

        Seeing 100% reproducible failure on branch_5x:

           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRangeTree -Dtests.method=testMultiValued -Dtests.seed=FD1D848DDE038459 -Dtests.slow=true -Dtests.locale=hr_HR -Dtests.timezone=Europe/Madrid -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
           [junit4] ERROR   0.05s J8 | TestRangeTree.testMultiValued <<<
           [junit4]    > Throwable #1: java.lang.IllegalArgumentException: maxValuesSortInHeap must be >= maxValuesInLeafNode; got 1250 vs maxValuesInLeafNode=2013
           [junit4]    > 	at __randomizedtesting.SeedInfo.seed([FD1D848DDE038459:293DE0BF10C1C411]:0)
           [junit4]    > 	at org.apache.lucene.rangetree.RangeTreeWriter.verifyParams(RangeTreeWriter.java:114)
           [junit4]    > 	at org.apache.lucene.rangetree.RangeTreeDocValuesFormat.<init>(RangeTreeDocValuesFormat.java:98)
           [junit4]    > 	at org.apache.lucene.rangetree.TestRangeTree.testMultiValued(TestRangeTree.java:128)
           [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
           [junit4]   2> NOTE: test params are: codec=Asserting(Lucene53): {}, docValues:{}, sim=DefaultSimilarity, locale=hr_HR, timezone=Europe/Madrid
           [junit4]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=16,threads=1,free=441475104,total=504365056
           [junit4]   2> NOTE: All tests run in this JVM: [TestRangeTree]
        
        Show
        Steve Rowe added a comment - Seeing 100% reproducible failure on branch_5x: [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestRangeTree -Dtests.method=testMultiValued -Dtests.seed=FD1D848DDE038459 -Dtests.slow=true -Dtests.locale=hr_HR -Dtests.timezone=Europe/Madrid -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] ERROR 0.05s J8 | TestRangeTree.testMultiValued <<< [junit4] > Throwable #1: java.lang.IllegalArgumentException: maxValuesSortInHeap must be >= maxValuesInLeafNode; got 1250 vs maxValuesInLeafNode=2013 [junit4] > at __randomizedtesting.SeedInfo.seed([FD1D848DDE038459:293DE0BF10C1C411]:0) [junit4] > at org.apache.lucene.rangetree.RangeTreeWriter.verifyParams(RangeTreeWriter.java:114) [junit4] > at org.apache.lucene.rangetree.RangeTreeDocValuesFormat.<init>(RangeTreeDocValuesFormat.java:98) [junit4] > at org.apache.lucene.rangetree.TestRangeTree.testMultiValued(TestRangeTree.java:128) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: test params are: codec=Asserting(Lucene53): {}, docValues:{}, sim=DefaultSimilarity, locale=hr_HR, timezone=Europe/Madrid [junit4] 2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=16,threads=1,free=441475104,total=504365056 [junit4] 2> NOTE: All tests run in this JVM: [TestRangeTree]
        Hide
        Michael McCandless added a comment -

        Hmm, I'll dig, thanks Steve Rowe.

        Show
        Michael McCandless added a comment - Hmm, I'll dig, thanks Steve Rowe .
        Hide
        ASF subversion and git services added a comment -

        Commit 1694808 from Michael McCandless in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1694808 ]

        LUCENE-6697: fix test bug

        Show
        ASF subversion and git services added a comment - Commit 1694808 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1694808 ] LUCENE-6697 : fix test bug
        Hide
        ASF subversion and git services added a comment -

        Commit 1694809 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1694809 ]

        LUCENE-6697: fix test bug

        Show
        ASF subversion and git services added a comment - Commit 1694809 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1694809 ] LUCENE-6697 : fix test bug
        Hide
        Michael McCandless added a comment -

        I committed a fix ... thanks Steve Rowe

        Show
        Michael McCandless added a comment - I committed a fix ... thanks Steve Rowe
        Hide
        Steve Rowe added a comment -

        Mike, the test fix should be committed to the lucene_solr_5_3 branch too, shouldn't it?

        Show
        Steve Rowe added a comment - Mike, the test fix should be committed to the lucene_solr_5_3 branch too, shouldn't it?
        Hide
        Michael McCandless added a comment -

        Oh yeah I'll backport to 5.3.x ... why don't we have an RC0 here yet

        Show
        Michael McCandless added a comment - Oh yeah I'll backport to 5.3.x ... why don't we have an RC0 here yet
        Hide
        ASF subversion and git services added a comment -

        Commit 1694814 from Michael McCandless in branch 'dev/branches/lucene_solr_5_3'
        [ https://svn.apache.org/r1694814 ]

        LUCENE-6697: fix test bug

        Show
        ASF subversion and git services added a comment - Commit 1694814 from Michael McCandless in branch 'dev/branches/lucene_solr_5_3' [ https://svn.apache.org/r1694814 ] LUCENE-6697 : fix test bug
        Hide
        Steve Rowe added a comment -

        Not sure if this warrants reopen or a new issue given the 5.3 release in process, so just leaving a comment:

        My Jenkins found a failing seed on branch_5x w/Java8 for TestRangeTree.testAllEqual() that reproduces rarely (http://jenkins.sarowe.net/job/Lucene-Solr-tests-5.x-Java8/1288/) - beasting triggered it reliably for me once or twice on the first round with 10 dups, on the same box:

         [beaster] Beast round: 1
           [junit4] Duplicate suite name used with XML reports: org.apache.lucene.rangetree.TestRangeTree. This may confuse tools that process XML reports. Set 'ignoreDuplicateSuites' to true to skip this message.
          [beaster] Executing 10 suites with 10 JVMs.
          [beaster] 
          [beaster] Started J0 PID(12439@localhost).
          [beaster] Started J2 PID(12438@localhost).
          [beaster] Started J5 PID(12398@localhost).
          [beaster] Started J6 PID(12409@localhost).
          [beaster] Started J8 PID(12414@localhost).
          [beaster] Started J9 PID(12437@localhost).
          [beaster] Started J4 PID(12402@localhost).
          [beaster] Started J3 PID(12430@localhost).
          [beaster] Started J7 PID(12400@localhost).
          [beaster] Started J1 PID(12397@localhost).
          [beaster]   2> Aug 14, 2015 9:24:01 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
          [beaster]   2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestRangeTree]
          [beaster]   2> java.lang.AssertionError: T0: iter=2 id=11552 docID=11478 value=-4284574795410342043 (range: -7914835514817223401 TO -3409576459183210390) expected true but got: false deleted?=false query=NumericRangeTreeQuery:field=sn_value:[-7914835514817223401 TO -3409576459183210390]
          [beaster]   2>        at __randomizedtesting.SeedInfo.seed([A757E5A079F28A1B]:0)
          [beaster]   2>        at org.junit.Assert.fail(Assert.java:93)
          [beaster]   2>        at org.apache.lucene.rangetree.TestRangeTree$4._run(TestRangeTree.java:502)
          [beaster]   2>        at org.apache.lucene.rangetree.TestRangeTree$4.run(TestRangeTree.java:433)
          [beaster]   2> 
          [beaster]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRangeTree -Dtests.method=testAllEqual -Dtests.seed=A757E5A079F28A1B -Dtests.slow=true -Dtests.locale=de_GR -Dtests.timezone=Canada/Eastern -Dtests.asserts=true -Dtests.file.encoding=UTF-8
          [beaster] [09:23:57.007] ERROR   6.34s J5  | TestRangeTree.testAllEqual <<<
          [beaster]    > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=22, name=T0, state=RUNNABLE, group=TGRP-TestRangeTree]
          [beaster]    >        at __randomizedtesting.SeedInfo.seed([A757E5A079F28A1B:75B730E5DA2C13B5]:0)
          [beaster]    > Caused by: java.lang.AssertionError: T0: iter=2 id=11552 docID=11478 value=-4284574795410342043 (range: -7914835514817223401 TO -3409576459183210390) expected true but got: false deleted?=false query=NumericRangeTreeQuery:field=sn_value:[-7914835514817223401 TO -3409576459183210390]
          [beaster]    >        at __randomizedtesting.SeedInfo.seed([A757E5A079F28A1B]:0)
          [beaster]    >        at org.junit.Assert.fail(Assert.java:93)
          [beaster]    >        at org.apache.lucene.rangetree.TestRangeTree$4._run(TestRangeTree.java:502)
          [beaster]    >        at org.apache.lucene.rangetree.TestRangeTree$4.run(TestRangeTree.java:433)
          [beaster]   2> NOTE: test params are: codec=Asserting(Lucene53): {}, docValues:{}, sim=DefaultSimilarity, locale=de_GR, timezone=Canada/Eastern
          [beaster]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=16,threads=1,free=471275456,total=522715136
          [beaster]   2> NOTE: All tests run in this JVM: [TestRangeTree]
          [beaster]   2> Aug 14, 2015 9:24:02 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
          [beaster]   2> WARNING: Uncaught exception in thread: Thread[T2,5,TGRP-TestRangeTree]
          [beaster]   2> java.lang.AssertionError: T2: iter=3 id=10830 docID=10756 value=-4284574795410342043 (range: -6351025883581717437 TO 7103712746369706722) expected true but got: false deleted?=false query=NumericRangeTreeQuery:field=sn_value:{-6351025883581717437 TO 7103712746369706722]
          [beaster]   2>        at __randomizedtesting.SeedInfo.seed([A757E5A079F28A1B]:0)
          [beaster]   2>        at org.junit.Assert.fail(Assert.java:93)
          [beaster]   2>        at org.apache.lucene.rangetree.TestRangeTree$4._run(TestRangeTree.java:502)
          [beaster]   2>        at org.apache.lucene.rangetree.TestRangeTree$4.run(TestRangeTree.java:433)
          [beaster]   2> 
          [beaster]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRangeTree -Dtests.method=testAllEqual -Dtests.seed=A757E5A079F28A1B -Dtests.slow=true -Dtests.locale=de_GR -Dtests.timezone=Canada/Eastern -Dtests.asserts=true -Dtests.file.encoding=UTF-8
          [beaster] [09:23:57.229] ERROR   7.52s J6  | TestRangeTree.testAllEqual <<<
          [beaster]    > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=24, name=T2, state=RUNNABLE, group=TGRP-TestRangeTree]
          [beaster]    >        at __randomizedtesting.SeedInfo.seed([A757E5A079F28A1B:75B730E5DA2C13B5]:0)
          [beaster]    > Caused by: java.lang.AssertionError: T2: iter=3 id=10830 docID=10756 value=-4284574795410342043 (range: -6351025883581717437 TO 7103712746369706722) expected true but got: false deleted?=false query=NumericRangeTreeQuery:field=sn_value:{-6351025883581717437 TO 7103712746369706722]
          [beaster]    >        at __randomizedtesting.SeedInfo.seed([A757E5A079F28A1B]:0)
          [beaster]    >        at org.junit.Assert.fail(Assert.java:93)
          [beaster]    >        at org.apache.lucene.rangetree.TestRangeTree$4._run(TestRangeTree.java:502)
          [beaster]    >        at org.apache.lucene.rangetree.TestRangeTree$4.run(TestRangeTree.java:433)
          [beaster]   2> NOTE: test params are: codec=Asserting(Lucene53): {}, docValues:{}, sim=DefaultSimilarity, locale=de_GR, timezone=Canada/Eastern
          [beaster]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=16,threads=1,free=462995928,total=522715136
          [beaster]   2> NOTE: All tests run in this JVM: [TestRangeTree]
          [beaster] 
          [beaster] Tests with failures:
          [beaster]   - org.apache.lucene.rangetree.TestRangeTree.testAllEqual
          [beaster]   - org.apache.lucene.rangetree.TestRangeTree.testAllEqual
        
        Show
        Steve Rowe added a comment - Not sure if this warrants reopen or a new issue given the 5.3 release in process, so just leaving a comment: My Jenkins found a failing seed on branch_5x w/Java8 for TestRangeTree.testAllEqual() that reproduces rarely ( http://jenkins.sarowe.net/job/Lucene-Solr-tests-5.x-Java8/1288/ ) - beasting triggered it reliably for me once or twice on the first round with 10 dups, on the same box: [beaster] Beast round: 1 [junit4] Duplicate suite name used with XML reports: org.apache.lucene.rangetree.TestRangeTree. This may confuse tools that process XML reports. Set 'ignoreDuplicateSuites' to true to skip this message. [beaster] Executing 10 suites with 10 JVMs. [beaster] [beaster] Started J0 PID(12439@localhost). [beaster] Started J2 PID(12438@localhost). [beaster] Started J5 PID(12398@localhost). [beaster] Started J6 PID(12409@localhost). [beaster] Started J8 PID(12414@localhost). [beaster] Started J9 PID(12437@localhost). [beaster] Started J4 PID(12402@localhost). [beaster] Started J3 PID(12430@localhost). [beaster] Started J7 PID(12400@localhost). [beaster] Started J1 PID(12397@localhost). [beaster] 2> Aug 14, 2015 9:24:01 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [beaster] 2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestRangeTree] [beaster] 2> java.lang.AssertionError: T0: iter=2 id=11552 docID=11478 value=-4284574795410342043 (range: -7914835514817223401 TO -3409576459183210390) expected true but got: false deleted?=false query=NumericRangeTreeQuery:field=sn_value:[-7914835514817223401 TO -3409576459183210390] [beaster] 2> at __randomizedtesting.SeedInfo.seed([A757E5A079F28A1B]:0) [beaster] 2> at org.junit.Assert.fail(Assert.java:93) [beaster] 2> at org.apache.lucene.rangetree.TestRangeTree$4._run(TestRangeTree.java:502) [beaster] 2> at org.apache.lucene.rangetree.TestRangeTree$4.run(TestRangeTree.java:433) [beaster] 2> [beaster] 2> NOTE: reproduce with: ant test -Dtestcase=TestRangeTree -Dtests.method=testAllEqual -Dtests.seed=A757E5A079F28A1B -Dtests.slow=true -Dtests.locale=de_GR -Dtests.timezone=Canada/Eastern -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [beaster] [09:23:57.007] ERROR 6.34s J5 | TestRangeTree.testAllEqual <<< [beaster] > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=22, name=T0, state=RUNNABLE, group=TGRP-TestRangeTree] [beaster] > at __randomizedtesting.SeedInfo.seed([A757E5A079F28A1B:75B730E5DA2C13B5]:0) [beaster] > Caused by: java.lang.AssertionError: T0: iter=2 id=11552 docID=11478 value=-4284574795410342043 (range: -7914835514817223401 TO -3409576459183210390) expected true but got: false deleted?=false query=NumericRangeTreeQuery:field=sn_value:[-7914835514817223401 TO -3409576459183210390] [beaster] > at __randomizedtesting.SeedInfo.seed([A757E5A079F28A1B]:0) [beaster] > at org.junit.Assert.fail(Assert.java:93) [beaster] > at org.apache.lucene.rangetree.TestRangeTree$4._run(TestRangeTree.java:502) [beaster] > at org.apache.lucene.rangetree.TestRangeTree$4.run(TestRangeTree.java:433) [beaster] 2> NOTE: test params are: codec=Asserting(Lucene53): {}, docValues:{}, sim=DefaultSimilarity, locale=de_GR, timezone=Canada/Eastern [beaster] 2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=16,threads=1,free=471275456,total=522715136 [beaster] 2> NOTE: All tests run in this JVM: [TestRangeTree] [beaster] 2> Aug 14, 2015 9:24:02 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [beaster] 2> WARNING: Uncaught exception in thread: Thread[T2,5,TGRP-TestRangeTree] [beaster] 2> java.lang.AssertionError: T2: iter=3 id=10830 docID=10756 value=-4284574795410342043 (range: -6351025883581717437 TO 7103712746369706722) expected true but got: false deleted?=false query=NumericRangeTreeQuery:field=sn_value:{-6351025883581717437 TO 7103712746369706722] [beaster] 2> at __randomizedtesting.SeedInfo.seed([A757E5A079F28A1B]:0) [beaster] 2> at org.junit.Assert.fail(Assert.java:93) [beaster] 2> at org.apache.lucene.rangetree.TestRangeTree$4._run(TestRangeTree.java:502) [beaster] 2> at org.apache.lucene.rangetree.TestRangeTree$4.run(TestRangeTree.java:433) [beaster] 2> [beaster] 2> NOTE: reproduce with: ant test -Dtestcase=TestRangeTree -Dtests.method=testAllEqual -Dtests.seed=A757E5A079F28A1B -Dtests.slow=true -Dtests.locale=de_GR -Dtests.timezone=Canada/Eastern -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [beaster] [09:23:57.229] ERROR 7.52s J6 | TestRangeTree.testAllEqual <<< [beaster] > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=24, name=T2, state=RUNNABLE, group=TGRP-TestRangeTree] [beaster] > at __randomizedtesting.SeedInfo.seed([A757E5A079F28A1B:75B730E5DA2C13B5]:0) [beaster] > Caused by: java.lang.AssertionError: T2: iter=3 id=10830 docID=10756 value=-4284574795410342043 (range: -6351025883581717437 TO 7103712746369706722) expected true but got: false deleted?=false query=NumericRangeTreeQuery:field=sn_value:{-6351025883581717437 TO 7103712746369706722] [beaster] > at __randomizedtesting.SeedInfo.seed([A757E5A079F28A1B]:0) [beaster] > at org.junit.Assert.fail(Assert.java:93) [beaster] > at org.apache.lucene.rangetree.TestRangeTree$4._run(TestRangeTree.java:502) [beaster] > at org.apache.lucene.rangetree.TestRangeTree$4.run(TestRangeTree.java:433) [beaster] 2> NOTE: test params are: codec=Asserting(Lucene53): {}, docValues:{}, sim=DefaultSimilarity, locale=de_GR, timezone=Canada/Eastern [beaster] 2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=16,threads=1,free=462995928,total=522715136 [beaster] 2> NOTE: All tests run in this JVM: [TestRangeTree] [beaster] [beaster] Tests with failures: [beaster] - org.apache.lucene.rangetree.TestRangeTree.testAllEqual [beaster] - org.apache.lucene.rangetree.TestRangeTree.testAllEqual
        Hide
        Michael McCandless added a comment -

        Thanks Steve Rowe, I'm also unsure! But I'll dig ...

        Show
        Michael McCandless added a comment - Thanks Steve Rowe , I'm also unsure! But I'll dig ...
        Hide
        Steve Rowe added a comment -

        Another failing test in the TestRangeTree suite, didn't try to repro, from http://jenkins.sarowe.net/job/Lucene-Solr-tests-5.x-Java7/1519/:

           [junit4] Suite: org.apache.lucene.rangetree.TestRangeTree
           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRangeTree -Dtests.method=testMultiValuedSortedSet -Dtests.seed=824523825E862EFE -Dtests.slow=true -Dtests.locale=en -Dtests.timezone=SystemV/EST5 -Dtests.asserts=true -Dtests.file.encoding=UTF-8
           [junit4] ERROR   0.09s J0 | TestRangeTree.testMultiValuedSortedSet <<<
           [junit4]    > Throwable #1: java.lang.IllegalArgumentException: maxValuesSortInHeap must be >= maxValuesInLeafNode; got 1552 vs maxValuesInLeafNode=1624
           [junit4]    > 	at __randomizedtesting.SeedInfo.seed([824523825E862EFE:F93C7006ECC25330]:0)
           [junit4]    > 	at org.apache.lucene.rangetree.RangeTreeWriter.verifyParams(RangeTreeWriter.java:114)
           [junit4]    > 	at org.apache.lucene.rangetree.RangeTreeDocValuesFormat.<init>(RangeTreeDocValuesFormat.java:98)
           [junit4]    > 	at org.apache.lucene.rangetree.TestRangeTree.testMultiValuedSortedSet(TestRangeTree.java:215)
           [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
        
        Show
        Steve Rowe added a comment - Another failing test in the TestRangeTree suite, didn't try to repro, from http://jenkins.sarowe.net/job/Lucene-Solr-tests-5.x-Java7/1519/ : [junit4] Suite: org.apache.lucene.rangetree.TestRangeTree [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestRangeTree -Dtests.method=testMultiValuedSortedSet -Dtests.seed=824523825E862EFE -Dtests.slow=true -Dtests.locale=en -Dtests.timezone=SystemV/EST5 -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 0.09s J0 | TestRangeTree.testMultiValuedSortedSet <<< [junit4] > Throwable #1: java.lang.IllegalArgumentException: maxValuesSortInHeap must be >= maxValuesInLeafNode; got 1552 vs maxValuesInLeafNode=1624 [junit4] > at __randomizedtesting.SeedInfo.seed([824523825E862EFE:F93C7006ECC25330]:0) [junit4] > at org.apache.lucene.rangetree.RangeTreeWriter.verifyParams(RangeTreeWriter.java:114) [junit4] > at org.apache.lucene.rangetree.RangeTreeDocValuesFormat.<init>(RangeTreeDocValuesFormat.java:98) [junit4] > at org.apache.lucene.rangetree.TestRangeTree.testMultiValuedSortedSet(TestRangeTree.java:215) [junit4] > at java.lang.Thread.run(Thread.java:745)
        Hide
        Michael McCandless added a comment -

        Thanks Steve Rowe, I'll fix ... that one is much easier

        Show
        Michael McCandless added a comment - Thanks Steve Rowe , I'll fix ... that one is much easier
        Hide
        ASF subversion and git services added a comment -

        Commit 1696181 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1696181 ]

        LUCENE-6697: fix test bugs

        Show
        ASF subversion and git services added a comment - Commit 1696181 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1696181 ] LUCENE-6697 : fix test bugs
        Hide
        ASF subversion and git services added a comment -

        Commit 1696182 from Michael McCandless in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1696182 ]

        LUCENE-6697: fix test bugs

        Show
        ASF subversion and git services added a comment - Commit 1696182 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1696182 ] LUCENE-6697 : fix test bugs
        Hide
        Michael McCandless added a comment -

        My Jenkins found a failing seed on branch_5x w/Java8 for TestRangeTree.testAllEqual()

        This was a fun one to track down

        https://issues.apache.org/jira/browse/LUCENE-6745 is the root cause...

        Show
        Michael McCandless added a comment - My Jenkins found a failing seed on branch_5x w/Java8 for TestRangeTree.testAllEqual() This was a fun one to track down https://issues.apache.org/jira/browse/LUCENE-6745 is the root cause...
        Hide
        ASF subversion and git services added a comment -

        Commit 1697046 from Michael McCandless in branch 'dev/branches/lucene6699'
        [ https://svn.apache.org/r1697046 ]

        LUCENE-6697: never use the original IndexInput for real IO

        Show
        ASF subversion and git services added a comment - Commit 1697046 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1697046 ] LUCENE-6697 : never use the original IndexInput for real IO
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development