Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5448

Invalid number of files reported in Parquet scan node

    Details

    • Epic Color:
      ghx-label-8

      Description

      It appears that the number of files reported in the HDFS scan node when reading Parquet data is miscounted, for the scan node below the number of files should be the same as number of RowGroups & Footers but the reported value is 219 which is 73 x NumColumns (3).

        HDFS_SCAN_NODE (id=0):(Total: 13s749ms, non-child: 13s749ms, % non-child: 100.00%)
                Hdfs split stats (<volume id>:<# splits>/<split lengths>): 7:9/1.90 GB 3:12/2.65 GB 2:5/936.63 MB 6:9/1.74 GB 1:8/1.66 GB 5:10/1.83 GB 0:9/2.07 GB 4:11/2.40 GB 
                ExecOption: PARQUET Codegen Enabled, Codegen enabled: 73 out of 73
                Runtime filters: Only following filters arrived: , waited 4s918ms
                Hdfs Read Thread Concurrency Bucket: 0:33.33% 1:48.48% 2:6.061% 3:12.12% 4:0% 5:0% 6:0% 7:0% 8:0% 9:0% 10:0% 11:0% 
                File Formats: PARQUET/SNAPPY:219 
                BytesRead(500.000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 200.00 KB, 129.86 MB, 314.73 MB, 562.12 MB, 1.09 GB, 1.32 GB, 2.37 GB, 3.68 GB, 4.34 GB, 4.87 GB, 5.22 GB, 5.39 GB, 5.58 GB, 5.63 GB, 5.66 GB, 5.69 GB, 5.71 GB, 5.75 GB, 5.78 GB, 5.82 GB, 5.86 GB, 5.90 GB, 5.94 GB, 5.97 GB
                 - FooterProcessingTime: (Avg: 711.035ms ; Min: 12.738ms ; Max: 1s958ms ; Number of samples: 73)
                 - AverageHdfsReadThreadConcurrency: 0.97 
                 - AverageScannerThreadConcurrency: 17.70 
                 - BytesRead: 6.01 GB (6452101777)
                 - BytesReadDataNodeCache: 0
                 - BytesReadLocal: 6.01 GB (6452101777)
                 - BytesReadRemoteUnexpected: 0
                 - BytesReadShortCircuit: 6.01 GB (6452101777)
                 - DecompressionTime: 16s189ms
                 - MaxCompressedTextFileLength: 0
                 - NumColumns: 3 (3)
                 - NumDisksAccessed: 8 (8)
                 - NumRowGroups: 73 (73)
                 - NumScannerThreadsStarted: 52 (52)
                 - PeakMemoryUsage: 2.09 GB (2248246487)
                 - PerReadThreadRawHdfsThroughput: 363.03 MB/sec
                 - RemoteScanRanges: 0 (0)
                 - RowBatchQueueGetWaitTime: 8s786ms
                 - RowBatchQueuePutWaitTime: 3s079ms
                 - RowsRead: 342.13M (342131176)
                 - RowsReturned: 2.54M (2537896)
                 - RowsReturnedRate: 184.58 K/sec
                 - ScanRangesComplete: 73 (73)
                 - ScannerThreadsInvoluntaryContextSwitches: 3.97K (3967)
                 - ScannerThreadsTotalWallClockTime: 4m41s
                   - MaterializeTupleTime(*): 13s302ms
                   - ScannerThreadsSysTime: 3s043ms
                   - ScannerThreadsUserTime: 26s263ms
                 - ScannerThreadsVoluntaryContextSwitches: 23.15K (23148)
                 - TotalRawHdfsReadTime(*): 16s949ms
                 - TotalReadThroughput: 359.75 MB/sec
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                stiga-huang Quanlong Huang
                Reporter:
                mmokhtar Mostafa Mokhtar
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: