Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7570

Fix unstable statistics tests

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.17.0
    • Fix Version/s: 1.18.0
    • Component/s: None
    • Labels:

      Description

      Drill contains tests for checking that statistics is applied, some of them also use sampling to calculate statistics value.

      Sampling adds limit above scan, but tests check the value of the estimated row count to verify that statistics were applied. limit without sorting doesn't guarantee consistent results, so these tests may fail sometime:

      [ERROR]   TestMetastoreCommands.testAnalyzeWithSampleStatistics:2739 Did not find expected pattern in plan: Filter\(condition.*\).*rowcount = 96.25,
      00-00    Screen : rowType = RecordType(ANY employee_id): rowcount = 105.0, cumulative cost = {2530.5 rows, 7570.5 cpu, 2310.0 io, 0.0 network, 0.0 memory}, id = 336738
      00-01      Project(employee_id=[$1]) : rowType = RecordType(ANY employee_id): rowcount = 105.0, cumulative cost = {2520.0 rows, 7560.0 cpu, 2310.0 io, 0.0 network, 0.0 memory}, id = 336737
      00-02        SelectionVectorRemover : rowType = RecordType(ANY department_id, ANY employee_id): rowcount = 105.0, cumulative cost = {2415.0 rows, 7455.0 cpu, 2310.0 io, 0.0 network, 0.0 memory}, id = 336736
      00-03          Filter(condition=[=($0, 2)]) : rowType = RecordType(ANY department_id, ANY employee_id): rowcount = 105.0, cumulative cost = {2310.0 rows, 7350.0 cpu, 2310.0 io, 0.0 network, 0.0 memory}, id = 336735
      00-04            Scan(table=[[dfs, tmp, employeeWithStatsFile]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/home/runner/work/drill/drill/exec/java-exec/target/org.apache.drill.exec.sql.TestMetastoreCommands/dfsTestTmp/1580901135676-0/employeeWithStatsFile/0_0_0.parquet]], selectionRoot=/home/runner/work/drill/drill/exec/java-exec/target/org.apache.drill.exec.sql.TestMetastoreCommands/dfsTestTmp/1580901135676-0/employeeWithStatsFile, numFiles=1, numRowGroups=1, usedMetadataFile=false, usedMetastore=true, filter=equal(`department_id`, 2) , columns=[`department_id`, `employee_id`]]]) : rowType = RecordType(ANY department_id, ANY employee_id): rowcount = 1155.0, cumulative cost = {1155.0 rows, 2310.0 cpu, 2310.0 io, 0.0 network, 0.0 memory}, id = 336734
       expected:<true> but was:<false> 

      List of tests to fix:

      • TestMetastoreCommands.testAnalyzeWithSampleStatistics;
      • TestAnalyze.basic3.

        Attachments

          Activity

            People

            • Assignee:
              volodymyr Vova Vysotskyi
              Reporter:
              volodymyr Vova Vysotskyi
              Reviewer:
              Arina Ielchiieva

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment