Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9560

Changing version from 3.4.0-SNAPSHOT to 3.4.0-RELEASE breaks TestStatsExtrapolation

    XMLWordPrintableJSON

    Details

      Description

      When working on the Impala 3.4 release, we changed the version on branch-3.4.0 from 3.4.0-SNAPSHOT to 3.4.0-RELEASE. 

      metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation() now fails with the following error:

      metadata/test_stats_extrapolation.py:44: in test_stats_extrapolation
          self.run_test_case('QueryTest/stats-extrapolation', vector, unique_database)
      common/impala_test_suite.py:690: in run_test_case
          self.__verify_results_and_errors(vector, test_section, result, use_db)
      common/impala_test_suite.py:523: in __verify_results_and_errors
          replace_filenames_with_placeholder)
      common/test_result_verifier.py:456: in verify_raw_results
          VERIFIER_MAP[verifier](expected, actual)
      common/test_result_verifier.py:246: in verify_query_result_is_subset
          assert expected_literal_strings <= actual_literal_strings
      E   assert Items in expected results not found in actual results:
      E     '   tuple-ids=0 row-size=4B cardinality=17.91K'
      E     Items in actual results:
      E     '|  output exprs: id'
      E     ''
      E     '     table: rows=unavailable size=unavailable'
      E     '   stored statistics:'
      E     'Max Per-Host Resource Reservation: Memory=8.00KB Threads=2'
      E     '     columns: unavailable'
      E     '     partitions: 0/24 rows=unavailable'
      E     '00:SCAN HDFS [test_stats_extrapolation_5c6bdfd.alltypes]'
      E     '   tuple-ids=0 row-size=4B cardinality=17.90K'
      E     '|'
      E     'Analyzed query: SELECT id FROM test_stats_extrapolation_5c6bdfd.alltypes'
      E     'F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1'
      E     '   HDFS partitions=24/24 files=36 size=281.43KB'
      E     'test_stats_extrapolation_5c6bdfd.alltypes'
      E     'PLAN-ROOT SINK'
      E     '|  mem-estimate=0B mem-reservation=0B thread-reservation=0'
      E     '|  Per-Host Resources: mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=2'
      E     '   in pipelines: 00(GETNEXT)'
      E     '   extrapolated-rows=unavailable max-scan-range-rows=unavailable'
      E     'Per-Host Resource Estimates: Memory=16MB'
      E     'WARNING: The following tables are missing relevant table and/or column statistics.'
      E     '   mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1'

      The output is expecting a cardinality of 17.91K, but instead the cardinality is 17.90K.

      The RELEASE version has one character fewer than the SNAPSHOT version. The version gets embedded in parquet files, so the parquet file is slightly smaller than before. The test is estimating cardinality by looking at the size of the parquet file. Apparently, this is right on the edge.

      This test should tolerate this difference.

        Attachments

          Activity

            People

            • Assignee:
              joemcdonnell Joe McDonnell
              Reporter:
              joemcdonnell Joe McDonnell
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: