Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.3
    • Fix Version/s: 5.3, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      java8, branch_5x & trunk, r1686892

      Original failure here (Linux): <http://jenkins.sarowe.net/job/Lucene-Solr-tests-5.x-Java8/144/>

      Reproduces for me on OS X 10.10 on branch_5x, but on trunk failed to reproduce the one time I tried it.

         [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestCloudPivotFacet -Dtests.method=test -Dtests.seed=2701C0115CD1BF95 -Dtests.slow=true -Dtests.locale=ja_JP -Dtests.timezone=America/Scoresbysund -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
         [junit4] FAILURE 42.6s | TestCloudPivotFacet.test <<<
         [junit4]    > Throwable #1: java.lang.AssertionError: {main(facet=true&facet.pivot=%7B%21stats%3Dst1%7Dpivot_y_s1%2Cdense_pivot_y_s%2Cdense_pivot_ti1&facet.limit=9&facet.sort=index),extra(rows=0&q=id%3A%5B*+TO+304%5D&stats=true&stats.field=%7B%21key%3Dsk1+tag%3Dst1%2Cst2%7Dpivot_tf1&stats.field=%7B%21key%3Dsk2+tag%3Dst2%2Cst3%7Dpivot_x_s&stats.field=%7B%21key%3Dsk3+tag%3Dst3%2Cst4%7Ddense_pivot_ti1&_test_sort=index)} ==> Mean of sk1 => pivot_y_s1,dense_pivot_y_s,dense_pivot_ti1: {params(rows=0),defaults({main(rows=0&q=id%3A%5B*+TO+304%5D&stats=true&stats.field=%7B%21key%3Dsk1+tag%3Dst1%2Cst2%7Dpivot_tf1&stats.field=%7B%21key%3Dsk2+tag%3Dst2%2Cst3%7Dpivot_x_s&stats.field=%7B%21key%3Dsk3+tag%3Dst3%2Cst4%7Ddense_pivot_ti1&_test_sort=index),extra(fq=%7B%21term+f%3Dpivot_y_s1%7Dh)})} expected:<-1.4136202738271603E8> but was:<-1.4136202738271606E8>
         [junit4]    > 	at __randomizedtesting.SeedInfo.seed([2701C0115CD1BF95:AF55FFCBF22DD26D]:0)
         [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:281)
         [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.test(TestCloudPivotFacet.java:228)
         [junit4]    > 	at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:960)
         [junit4]    > 	at org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:935)
         [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
         [junit4]    > Caused by: java.lang.AssertionError: Mean of sk1 => pivot_y_s1,dense_pivot_y_s,dense_pivot_ti1: {params(rows=0),defaults({main(rows=0&q=id%3A%5B*+TO+304%5D&stats=true&stats.field=%7B%21key%3Dsk1+tag%3Dst1%2Cst2%7Dpivot_tf1&stats.field=%7B%21key%3Dsk2+tag%3Dst2%2Cst3%7Dpivot_x_s&stats.field=%7B%21key%3Dsk3+tag%3Dst3%2Cst4%7Ddense_pivot_ti1&_test_sort=index),extra(fq=%7B%21term+f%3Dpivot_y_s1%7Dh)})} expected:<-1.4136202738271603E8> but was:<-1.4136202738271606E8>
         [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotStats(TestCloudPivotFacet.java:383)
         [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotData(TestCloudPivotFacet.java:339)
         [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:302)
         [junit4]    > 	at org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:271)
         [junit4]    > 	... 42 more
       
      1. SOLR-7712.patch
        8 kB
        Hoss Man
      2. SOLR-7712.patch
        7 kB
        Hoss Man

        Activity

        Hide
        Hoss Man added a comment -

        TL;DR: I think this test needs some some small deltas when doing float/double comparisons – but i want to think about it a bit more.


        FYI: I couldn't immediately reproduce that seed at r1686892 on 5x, the first time i to beast it made it 5 iterations before it failed with the exact same numeric values.

        What this test is doing is executing a pivot fact request with stats enabled, then executing a "drill down" queries on each pivot field value (by adding an fq) and confirming the numResults matches the previously returned pivot count, and that the top level stats of the drill down query equal the previously returned per-pivot stats.

        I remember when writing this test that i expected i would need to do some 3 arg asserts on the floating point stat values to account for a small epsilon that might be introduced in the floating point match because of the different order of computation – but when i ran the initial straw man test i never got any errors.

        I also remember working through the problem in my head for a few minutes to try and figure out why it wasn't failing – and figuring that regardless of whether it was top level stats or pivot stats, the same code was going to accumulate the (per-index) values in the same order (the test doesn't do any index updates concurrent to searching for obvious reasons) and i just left it alone.

        looking at these failures now, and adding some extra logging and thinking about it a bit more, i think we've just been getting lucky that this hasn't failed yet.

        I can think of 2 ways for this type of validation to fail on exact floating point comparison:

        1. if different replicas get hit by the top level diff queries, or a diff replica was used to refine a pivot then is used by the verification query, and if those diff replicas have slightly diff segments (due ot recovery, or retry or whatever so the accumulation of the "sum" of a field for all the matching docs adds the numbers up in a diff order.
        2. if there are more then 2 shards, and the 2 top level queries (or the refinement of a pivot query) gets responses from the shards in a diff order) then the sum of the aggregate "sum" involves adding numbers in a diff order.

        if the "sum" is diff, then the "mean" might be diff as well (depending on precision loss due to dividing by the count)

        But most times, we deal with numeric values that either don't lose precision when added up, or lose precision the same amount in both ways.

        Show
        Hoss Man added a comment - TL;DR: I think this test needs some some small deltas when doing float/double comparisons – but i want to think about it a bit more. FYI: I couldn't immediately reproduce that seed at r1686892 on 5x, the first time i to beast it made it 5 iterations before it failed with the exact same numeric values. What this test is doing is executing a pivot fact request with stats enabled, then executing a "drill down" queries on each pivot field value (by adding an fq) and confirming the numResults matches the previously returned pivot count, and that the top level stats of the drill down query equal the previously returned per-pivot stats. I remember when writing this test that i expected i would need to do some 3 arg asserts on the floating point stat values to account for a small epsilon that might be introduced in the floating point match because of the different order of computation – but when i ran the initial straw man test i never got any errors. I also remember working through the problem in my head for a few minutes to try and figure out why it wasn't failing – and figuring that regardless of whether it was top level stats or pivot stats, the same code was going to accumulate the (per-index) values in the same order (the test doesn't do any index updates concurrent to searching for obvious reasons) and i just left it alone. looking at these failures now, and adding some extra logging and thinking about it a bit more, i think we've just been getting lucky that this hasn't failed yet. I can think of 2 ways for this type of validation to fail on exact floating point comparison: if different replicas get hit by the top level diff queries, or a diff replica was used to refine a pivot then is used by the verification query, and if those diff replicas have slightly diff segments (due ot recovery, or retry or whatever so the accumulation of the "sum" of a field for all the matching docs adds the numbers up in a diff order. if there are more then 2 shards, and the 2 top level queries (or the refinement of a pivot query) gets responses from the shards in a diff order) then the sum of the aggregate "sum" involves adding numbers in a diff order. if the "sum" is diff, then the "mean" might be diff as well (depending on precision loss due to dividing by the count) But most times, we deal with numeric values that either don't lose precision when added up, or lose precision the same amount in both ways.
        Hide
        Hoss Man added a comment -

        here's a patch i'm currently beasting on both branches.

        Show
        Hoss Man added a comment - here's a patch i'm currently beasting on both branches.
        Hide
        Hoss Man added a comment -

        bah .. ignore that patch. i totally forgot we support stats on dates and the handful of manual runs i did with that patch never triggered date stats. new patch soon.

        Show
        Hoss Man added a comment - bah .. ignore that patch. i totally forgot we support stats on dates and the handful of manual runs i did with that patch never triggered date stats. new patch soon.
        Hide
        Hoss Man added a comment -

        beasted this new patch 1500 x2 times last night (trunk + 5x)

        I'll keep beasting and plan on committing monday.

        Show
        Hoss Man added a comment - beasted this new patch 1500 x2 times last night (trunk + 5x) I'll keep beasting and plan on committing monday.
        Hide
        ASF subversion and git services added a comment -

        Commit 1688266 from hossman@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1688266 ]

        SOLR-7712: fixed test to account for aggregate floating point precision loss

        Show
        ASF subversion and git services added a comment - Commit 1688266 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1688266 ] SOLR-7712 : fixed test to account for aggregate floating point precision loss
        Hide
        ASF subversion and git services added a comment -

        Commit 1688267 from hossman@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1688267 ]

        SOLR-7712: fixed test to account for aggregate floating point precision loss (merge r1688266)

        Show
        ASF subversion and git services added a comment - Commit 1688267 from hossman@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1688267 ] SOLR-7712 : fixed test to account for aggregate floating point precision loss (merge r1688266)
        Hide
        Hoss Man added a comment -

        thanks steve

        Show
        Hoss Man added a comment - thanks steve
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Hoss Man
            Reporter:
            Steve Rowe
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development