Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7570

Fix unstable statistics tests

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.17.0
    • 1.18.0
    • None

    Description

      Drill contains tests for checking that statistics is applied, some of them also use sampling to calculate statistics value.

      Sampling adds limit above scan, but tests check the value of the estimated row count to verify that statistics were applied. limit without sorting doesn't guarantee consistent results, so these tests may fail sometime:

      [ERROR]   TestMetastoreCommands.testAnalyzeWithSampleStatistics:2739 Did not find expected pattern in plan: Filter\(condition.*\).*rowcount = 96.25,
      00-00    Screen : rowType = RecordType(ANY employee_id): rowcount = 105.0, cumulative cost = {2530.5 rows, 7570.5 cpu, 2310.0 io, 0.0 network, 0.0 memory}, id = 336738
      00-01      Project(employee_id=[$1]) : rowType = RecordType(ANY employee_id): rowcount = 105.0, cumulative cost = {2520.0 rows, 7560.0 cpu, 2310.0 io, 0.0 network, 0.0 memory}, id = 336737
      00-02        SelectionVectorRemover : rowType = RecordType(ANY department_id, ANY employee_id): rowcount = 105.0, cumulative cost = {2415.0 rows, 7455.0 cpu, 2310.0 io, 0.0 network, 0.0 memory}, id = 336736
      00-03          Filter(condition=[=($0, 2)]) : rowType = RecordType(ANY department_id, ANY employee_id): rowcount = 105.0, cumulative cost = {2310.0 rows, 7350.0 cpu, 2310.0 io, 0.0 network, 0.0 memory}, id = 336735
      00-04            Scan(table=[[dfs, tmp, employeeWithStatsFile]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/home/runner/work/drill/drill/exec/java-exec/target/org.apache.drill.exec.sql.TestMetastoreCommands/dfsTestTmp/1580901135676-0/employeeWithStatsFile/0_0_0.parquet]], selectionRoot=/home/runner/work/drill/drill/exec/java-exec/target/org.apache.drill.exec.sql.TestMetastoreCommands/dfsTestTmp/1580901135676-0/employeeWithStatsFile, numFiles=1, numRowGroups=1, usedMetadataFile=false, usedMetastore=true, filter=equal(`department_id`, 2) , columns=[`department_id`, `employee_id`]]]) : rowType = RecordType(ANY department_id, ANY employee_id): rowcount = 1155.0, cumulative cost = {1155.0 rows, 2310.0 cpu, 2310.0 io, 0.0 network, 0.0 memory}, id = 336734
       expected:<true> but was:<false> 

      List of tests to fix:

      • TestMetastoreCommands.testAnalyzeWithSampleStatistics;
      • TestAnalyze.basic3.

      Attachments

        Issue Links

          Activity

            People

              volodymyr Vova Vysotskyi
              volodymyr Vova Vysotskyi
              Arina Ielchiieva Arina Ielchiieva
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: