Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.7.0
Description
Query
select count(*)
from
store_sales,
date_dim,
item where
ss_sold_date_sk = d_date_sk and d_year = 2001
and i_color="red" and i_item_sk = ss_item_sk;
From the query profile, 28 is the number of files that qualify the Runtime filter applied on the partition column, and PARQUET/NONE:114 is the number of files that were skipped.
It seems that since some partitions are skipped the scan node doesn't start reading the actual file and never figures the compression codec used.
This behavior creates the impression that the table has mix of files with/without snappy compression.
HDFS_SCAN_NODE (id=0):(Total: 231.753ms, non-child: 231.753ms, % non-child: 100.00%) Hdfs split stats (<volume id>:<# splits>/<split lengths>): 14:2/234.18 MB 3:2/235.19 MB 11:6/803.17 MB 6:6/907.14 MB 8:4/534.19 MB 16:9/1.16 GB 20:6/804.40 MB 10:6/770.85 MB 23:8/1.08 GB 9:5/652.42 MB 13:5/996.62 MB 21:6/667.40 MB 4:6/1009.93 MB 7:8/1.01 GB 15:6/906.12 MB 12:7/719.26 MB 18:8/1.15 GB 1:4/536.00 MB 17:8/1.35 GB 0:10/1.10 GB 5:4/742.22 MB 22:6/737.18 MB 19:5/412.86 MB 2:5/619.78 MB ExecOption: PARQUET Codegen Enabled, Codegen enabled: 28 out of 28 Runtime filters: All filters arrived. Waited 0 Hdfs Read Thread Concurrency Bucket: 0:100% 1:0% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% 8:0% 9:0% 10:0% 11:0% 12:0% 13:0% 14:0% 15:0% 16:0% 17:0% 18:0% 19:0% 20:0% 21:0% 22:0% 23:0% 24:0% 25:0% 26:0% 27:0% File Formats: PARQUET/NONE:114 PARQUET/SNAPPY:28 BytesRead(500.000ms): 0, 0, 403.77 MB - FooterProcessingTime: (Avg: 6.642ms ; Min: 1.600ms ; Max: 11.104ms ; Number of samples: 28) - AverageHdfsReadThreadConcurrency: 0.00 - AverageScannerThreadConcurrency: 24.00 - BytesRead: 403.77 MB (423385013) - BytesReadDataNodeCache: 0 - BytesReadLocal: 403.77 MB (423385013) - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 403.77 MB (423385013) - DecompressionTime: 2s277ms - MaxCompressedTextFileLength: 0 - NumColumns: 1 (1) - NumDisksAccessed: 19 (19) - NumRowGroups: 28 (28) - NumScannerThreadsStarted: 24 (24) - PeakMemoryUsage: 458.49 MB (480762350) - PerReadThreadRawHdfsThroughput: 603.59 MB/sec - RemoteScanRanges: 0 (0) - RowBatchQueueGetWaitTime: 217.250ms - RowBatchQueuePutWaitTime: 25.089ms - RowsRead: 88.09M (88089105) - RowsReturned: 2.00M (1995236) - RowsReturnedRate: 8.61 M/sec - ScanRangesComplete: 142 (142) - ScannerThreadsInvoluntaryContextSwitches: 2.60K (2595) - ScannerThreadsTotalWallClockTime: 19s532ms - MaterializeTupleTime(*): 8s089ms - ScannerThreadsSysTime: 6s562ms - ScannerThreadsUserTime: 9s532ms - ScannerThreadsVoluntaryContextSwitches: 110.39K (110392) - TotalRawHdfsReadTime(*): 668.947ms - TotalReadThroughput: 269.18 MB/sec Filter 0 (1.00 MB): - Files processed: 142 (142) - Files rejected: 114 (114) - Files total: 142 (142) - RowGroups processed: 1.96K (1960) - RowGroups rejected: 0 (0) - RowGroups total: 1.96K (1960) - Rows processed: 18.10M (18096720) - Rows rejected: 0 (0) - Rows total: 88.09M (88089105) - Splits processed: 28 (28) - Splits rejected: 0 (0) - Splits total: 28 (28) Filter 1 (1.00 MB): - Rows processed: 88.09M (88089105) - Rows rejected: 86.09M (86093869) - Rows total: 88.09M (88089105)
Attachments
Issue Links
- is related to
-
IMPALA-5311 Select count(*) queries show in incorrect compression in profile
- Resolved
- relates to
-
IMPALA-5311 Select count(*) queries show in incorrect compression in profile
- Resolved