Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7570

Fix unstable statistics tests

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.17.0
    • 1.18.0
    • None

    Description

      Drill contains tests for checking that statistics is applied, some of them also use sampling to calculate statistics value.

      Sampling adds limit above scan, but tests check the value of the estimated row count to verify that statistics were applied. limit without sorting doesn't guarantee consistent results, so these tests may fail sometime:

      [ERROR]   TestMetastoreCommands.testAnalyzeWithSampleStatistics:2739 Did not find expected pattern in plan: Filter\(condition.*\).*rowcount = 96.25,
      00-00    Screen : rowType = RecordType(ANY employee_id): rowcount = 105.0, cumulative cost = {2530.5 rows, 7570.5 cpu, 2310.0 io, 0.0 network, 0.0 memory}, id = 336738
      00-01      Project(employee_id=[$1]) : rowType = RecordType(ANY employee_id): rowcount = 105.0, cumulative cost = {2520.0 rows, 7560.0 cpu, 2310.0 io, 0.0 network, 0.0 memory}, id = 336737
      00-02        SelectionVectorRemover : rowType = RecordType(ANY department_id, ANY employee_id): rowcount = 105.0, cumulative cost = {2415.0 rows, 7455.0 cpu, 2310.0 io, 0.0 network, 0.0 memory}, id = 336736
      00-03          Filter(condition=[=($0, 2)]) : rowType = RecordType(ANY department_id, ANY employee_id): rowcount = 105.0, cumulative cost = {2310.0 rows, 7350.0 cpu, 2310.0 io, 0.0 network, 0.0 memory}, id = 336735
      00-04            Scan(table=[[dfs, tmp, employeeWithStatsFile]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/home/runner/work/drill/drill/exec/java-exec/target/org.apache.drill.exec.sql.TestMetastoreCommands/dfsTestTmp/1580901135676-0/employeeWithStatsFile/0_0_0.parquet]], selectionRoot=/home/runner/work/drill/drill/exec/java-exec/target/org.apache.drill.exec.sql.TestMetastoreCommands/dfsTestTmp/1580901135676-0/employeeWithStatsFile, numFiles=1, numRowGroups=1, usedMetadataFile=false, usedMetastore=true, filter=equal(`department_id`, 2) , columns=[`department_id`, `employee_id`]]]) : rowType = RecordType(ANY department_id, ANY employee_id): rowcount = 1155.0, cumulative cost = {1155.0 rows, 2310.0 cpu, 2310.0 io, 0.0 network, 0.0 memory}, id = 336734
       expected:<true> but was:<false> 

      List of tests to fix:

      • TestMetastoreCommands.testAnalyzeWithSampleStatistics;
      • TestAnalyze.basic3.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            volodymyr Vova Vysotskyi
            volodymyr Vova Vysotskyi
            Arina Ielchiieva Arina Ielchiieva
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment