Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7418

MetadataDirectGroupScan improvements

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.16.0
    • Fix Version/s: 1.17.0
    • Component/s: None
    • Labels:

      Description

      When count is converted to direct scan (case when statistics or table metadata are available and there is no need to perform count operation), MetadataDirectGroupScan is used. Proposed MetadataDirectGroupScan enhancements:
      1. Show table selection root instead listing all table files. If table has lots of files, query plan gets polluted with all files enumeration. Since files are not used for calculation (only metadata), they are not relevant and can be excluded from the plan.

      Before:

      | 00-00    Screen
      00-01      Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
      00-02        DirectScan(groupscan=[files = [/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_0.parquet, /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_5.parquet, /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_4.parquet, /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_9.parquet, /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_3.parquet, /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_6.parquet, /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_7.parquet, /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_10.parquet, /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_2.parquet, /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_1.parquet, /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_8.parquet], numFiles = 11, usedMetadataSummaryFile = false, DynamicPojoRecordReader{records = [[1560060, 2880404, 2880404, 0]]}])
      

      After:

      | 00-00    Screen
      00-01      Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
      00-02        DirectScan(groupscan=[selectionRoot = /drill/testdata/metadata_cache/store_sales_null_blocks_all, numFiles = 11, 
      usedMetadataSummaryFile = false, DynamicPojoRecordReader{records = [[1560060, 2880404, 2880404, 0]]}])
      

      For Hive tables which were scanned directly, selection root is not available thus will be omitted.

      2. Submission of physical plan which contains MetadataDirectGroupScan fails with deserialization errors, proper ser / de should be implemented.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                arina Arina Ielchiieva
                Reporter:
                arina Arina Ielchiieva
                Reviewer:
                Vova Vysotskyi
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: