Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4863

Incorrect accounting of file count and compression type when Runtime filters are applied on partition and non-partition columns

    Details

      Description

      Query

      select count(*) 
      from 
      store_sales, 
      date_dim, 
      item where 
      ss_sold_date_sk = d_date_sk and d_year = 2001 
      and i_color="red" and i_item_sk = ss_item_sk;
      

      From the query profile, 28 is the number of files that qualify the Runtime filter applied on the partition column, and PARQUET/NONE:114 is the number of files that were skipped.
      It seems that since some partitions are skipped the scan node doesn't start reading the actual file and never figures the compression codec used.

      This behavior creates the impression that the table has mix of files with/without snappy compression.

             HDFS_SCAN_NODE (id=0):(Total: 231.753ms, non-child: 231.753ms, % non-child: 100.00%)
                Hdfs split stats (<volume id>:<# splits>/<split lengths>): 14:2/234.18 MB 3:2/235.19 MB 11:6/803.17 MB 6:6/907.14 MB 8:4/534.19 MB 16:9/1.16 GB 20:6/804.40 MB 10:6/770.85 MB 23:8/1.08 GB 9:5/652.42 MB 13:5/996.62 MB 21:6/667.40 MB 4:6/1009.93 MB 7:8/1.01 GB 15:6/906.12 MB 12:7/719.26 MB 18:8/1.15 GB 1:4/536.00 MB 17:8/1.35 GB 0:10/1.10 GB 5:4/742.22 MB 22:6/737.18 MB 19:5/412.86 MB 2:5/619.78 MB 
                ExecOption: PARQUET Codegen Enabled, Codegen enabled: 28 out of 28
                Runtime filters: All filters arrived. Waited 0
                Hdfs Read Thread Concurrency Bucket: 0:100% 1:0% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% 8:0% 9:0% 10:0% 11:0% 12:0% 13:0% 14:0% 15:0% 16:0% 17:0% 18:0% 19:0% 20:0% 21:0% 22:0% 23:0% 24:0% 25:0% 26:0% 27:0% 
                File Formats: PARQUET/NONE:114 PARQUET/SNAPPY:28 
                BytesRead(500.000ms): 0, 0, 403.77 MB
                 - FooterProcessingTime: (Avg: 6.642ms ; Min: 1.600ms ; Max: 11.104ms ; Number of samples: 28)
                 - AverageHdfsReadThreadConcurrency: 0.00 
                 - AverageScannerThreadConcurrency: 24.00 
                 - BytesRead: 403.77 MB (423385013)
                 - BytesReadDataNodeCache: 0
                 - BytesReadLocal: 403.77 MB (423385013)
                 - BytesReadRemoteUnexpected: 0
                 - BytesReadShortCircuit: 403.77 MB (423385013)
                 - DecompressionTime: 2s277ms
                 - MaxCompressedTextFileLength: 0
                 - NumColumns: 1 (1)
                 - NumDisksAccessed: 19 (19)
                 - NumRowGroups: 28 (28)
                 - NumScannerThreadsStarted: 24 (24)
                 - PeakMemoryUsage: 458.49 MB (480762350)
                 - PerReadThreadRawHdfsThroughput: 603.59 MB/sec
                 - RemoteScanRanges: 0 (0)
                 - RowBatchQueueGetWaitTime: 217.250ms
                 - RowBatchQueuePutWaitTime: 25.089ms
                 - RowsRead: 88.09M (88089105)
                 - RowsReturned: 2.00M (1995236)
                 - RowsReturnedRate: 8.61 M/sec
                 - ScanRangesComplete: 142 (142)
                 - ScannerThreadsInvoluntaryContextSwitches: 2.60K (2595)
                 - ScannerThreadsTotalWallClockTime: 19s532ms
                   - MaterializeTupleTime(*): 8s089ms
                   - ScannerThreadsSysTime: 6s562ms
                   - ScannerThreadsUserTime: 9s532ms
                 - ScannerThreadsVoluntaryContextSwitches: 110.39K (110392)
                 - TotalRawHdfsReadTime(*): 668.947ms
                 - TotalReadThroughput: 269.18 MB/sec
                Filter 0 (1.00 MB):
                   - Files processed: 142 (142)
                   - Files rejected: 114 (114)
                   - Files total: 142 (142)
                   - RowGroups processed: 1.96K (1960)
                   - RowGroups rejected: 0 (0)
                   - RowGroups total: 1.96K (1960)
                   - Rows processed: 18.10M (18096720)
                   - Rows rejected: 0 (0)
                   - Rows total: 88.09M (88089105)
                   - Splits processed: 28 (28)
                   - Splits rejected: 0 (0)
                   - Splits total: 28 (28)
                Filter 1 (1.00 MB):
                   - Rows processed: 88.09M (88089105)
                   - Rows rejected: 86.09M (86093869)
                   - Rows total: 88.09M (88089105)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                anujphadke Anuj Phadke
                Reporter:
                mmokhtar Mostafa Mokhtar
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: