Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5680

[Rust] datafusion group-by tests depends on result set order

    XMLWordPrintableJSON

    Details

      Description

      See https://circleci.com/gh/ursa-labs/crossbow/223?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

      once I properly export ARROW_TEST_DATA and PARQUET_TEST_DATA, I get further failures, e.g.

      running 18 tests
      test csv_query_group_by_int_min_max ... FAILED
      test csv_query_external_table_count ... ok
      test csv_query_count ... ok
      test csv_count_star ... ok
      test csv_query_avg ... ok
      test csv_query_avg_multi_batch ... ok
      test csv_query_cast ... ok
      test csv_query_group_by_avg ... FAILED
      test csv_query_group_by_string_min_max ... FAILED
      test csv_query_group_by_int_count ... FAILED
      test csv_query_limit ... ok
      test csv_query_limit_bigger_than_nbr_of_rows ... ok
      test csv_query_limit_with_same_nbr_of_rows ... ok
      test csv_query_cast_literal ... ok
      test csv_query_limit_zero ... ok
      test csv_query_create_external_table ... ok
      test csv_query_with_predicate ... ok
      test parquet_query ... ok
      
      failures:
      
      ---- csv_query_group_by_int_min_max stdout ----
      thread 'csv_query_group_by_int_min_max' panicked at 'assertion failed: `(left == right)`
        left: `"4\t0.02182578039211991\t0.9237877978193884\n5\t0.01479305307777301\t0.9723580396501548\n2\t0.16301110515739792\t0.991517828651004\n3\t0.047343434291126085\t0.9293883502480845\n1\t0.05636955101974106\t0.9965400387585364\n"`,
       right: `"4\t0.02182578039211991\t0.9237877978193884\n2\t0.16301110515739792\t0.991517828651004\n5\t0.01479305307777301\t0.9723580396501548\n3\t0.047343434291126085\t0.9293883502480845\n1\t0.05636955101974106\t0.9965400387585364\n"`', datafusion/tests/sql.rs:77:5
      note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
      
      ---- csv_query_group_by_avg stdout ----
      thread 'csv_query_group_by_avg' panicked at 'assertion failed: `(left == right)`
        left: `"\"a\"\t0.48754517466109415\n\"e\"\t0.48600669271341534\n\"d\"\t0.48855379387549824\n\"c\"\t0.6600456536439784\n\"b\"\t0.41040709263815384\n"`,
       right: `"\"d\"\t0.48855379387549824\n\"c\"\t0.6600456536439784\n\"b\"\t0.41040709263815384\n\"a\"\t0.48754517466109415\n\"e\"\t0.48600669271341534\n"`', datafusion/tests/sql.rs:99:5
      
      ---- csv_query_group_by_string_min_max stdout ----
      thread 'csv_query_group_by_string_min_max' panicked at 'assertion failed: `(left == right)`
        left: `"\"a\"\t0.02182578039211991\t0.9800193410444061\n\"e\"\t0.01479305307777301\t0.9965400387585364\n\"d\"\t0.061029375346466685\t0.9748360509016578\n\"c\"\t0.0494924465469434\t0.991517828651004\n\"b\"\t0.04893135681998029\t0.9185813970744787\n"`,
       right: `"\"d\"\t0.061029375346466685\t0.9748360509016578\n\"c\"\t0.0494924465469434\t0.991517828651004\n\"b\"\t0.04893135681998029\t0.9185813970744787\n\"a\"\t0.02182578039211991\t0.9800193410444061\n\"e\"\t0.01479305307777301\t0.9965400387585364\n"`', datafusion/tests/sql.rs:187:5
      
      ---- csv_query_group_by_int_count stdout ----
      thread 'csv_query_group_by_int_count' panicked at 'assertion failed: `(left == right)`
        left: `"\"a\"\t21\n\"e\"\t21\n\"d\"\t18\n\"c\"\t21\n\"b\"\t19\n"`,
       right: `"\"d\"\t18\n\"c\"\t21\n\"b\"\t19\n\"a\"\t21\n\"e\"\t21\n"`', datafusion/tests/sql.rs:175:5
      

      I suspect that the tests are expecting the group-by results in a fix order. That would be highly dependent on the iterator of the hash table. Note that once I did a rustup update (and docker rmi rustlangrust/nightly), the failures have gone away.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                andygrove Andy Grove
                Reporter:
                fsaintjacques Francois Saint-Jacques
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h