Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
Impala 2.9.0
-
ghx-label-8
Description
It appears that the number of files reported in the HDFS scan node when reading Parquet data is miscounted, for the scan node below the number of files should be the same as number of RowGroups & Footers but the reported value is 219 which is 73 x NumColumns (3).
HDFS_SCAN_NODE (id=0):(Total: 13s749ms, non-child: 13s749ms, % non-child: 100.00%) Hdfs split stats (<volume id>:<# splits>/<split lengths>): 7:9/1.90 GB 3:12/2.65 GB 2:5/936.63 MB 6:9/1.74 GB 1:8/1.66 GB 5:10/1.83 GB 0:9/2.07 GB 4:11/2.40 GB ExecOption: PARQUET Codegen Enabled, Codegen enabled: 73 out of 73 Runtime filters: Only following filters arrived: , waited 4s918ms Hdfs Read Thread Concurrency Bucket: 0:33.33% 1:48.48% 2:6.061% 3:12.12% 4:0% 5:0% 6:0% 7:0% 8:0% 9:0% 10:0% 11:0% File Formats: PARQUET/SNAPPY:219 BytesRead(500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 200.00 KB, 129.86 MB, 314.73 MB, 562.12 MB, 1.09 GB, 1.32 GB, 2.37 GB, 3.68 GB, 4.34 GB, 4.87 GB, 5.22 GB, 5.39 GB, 5.58 GB, 5.63 GB, 5.66 GB, 5.69 GB, 5.71 GB, 5.75 GB, 5.78 GB, 5.82 GB, 5.86 GB, 5.90 GB, 5.94 GB, 5.97 GB - FooterProcessingTime: (Avg: 711.035ms ; Min: 12.738ms ; Max: 1s958ms ; Number of samples: 73) - AverageHdfsReadThreadConcurrency: 0.97 - AverageScannerThreadConcurrency: 17.70 - BytesRead: 6.01 GB (6452101777) - BytesReadDataNodeCache: 0 - BytesReadLocal: 6.01 GB (6452101777) - BytesReadRemoteUnexpected: 0 - BytesReadShortCircuit: 6.01 GB (6452101777) - DecompressionTime: 16s189ms - MaxCompressedTextFileLength: 0 - NumColumns: 3 (3) - NumDisksAccessed: 8 (8) - NumRowGroups: 73 (73) - NumScannerThreadsStarted: 52 (52) - PeakMemoryUsage: 2.09 GB (2248246487) - PerReadThreadRawHdfsThroughput: 363.03 MB/sec - RemoteScanRanges: 0 (0) - RowBatchQueueGetWaitTime: 8s786ms - RowBatchQueuePutWaitTime: 3s079ms - RowsRead: 342.13M (342131176) - RowsReturned: 2.54M (2537896) - RowsReturnedRate: 184.58 K/sec - ScanRangesComplete: 73 (73) - ScannerThreadsInvoluntaryContextSwitches: 3.97K (3967) - ScannerThreadsTotalWallClockTime: 4m41s - MaterializeTupleTime(*): 13s302ms - ScannerThreadsSysTime: 3s043ms - ScannerThreadsUserTime: 26s263ms - ScannerThreadsVoluntaryContextSwitches: 23.15K (23148) - TotalRawHdfsReadTime(*): 16s949ms - TotalReadThroughput: 359.75 MB/sec
Attachments
Issue Links
- breaks
-
IMPALA-6040 test_multi_compression_types uses hive in incompatible environments
- Resolved