Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-292

Parquet performance issues on large dataset

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • Impala 0.7
    • Impala 1.0
    • None
    • None

    Description

      Simple aggregation query using Parquet format (TPCH-Q1) on a 10TB dataset takes 35 minutes to complete (compared to 15 minutes for RC/SNAP)

      Parquet
      -----------
           HDFS_SCAN_NODE (id=0):(14m36s 57.33%)
               - AverageHdfsReadThreadConcurrency: 0.61 
                 - HdfsReadThreadConcurrencyCountPercentage=0: 56.68 
                 - HdfsReadThreadConcurrencyCountPercentage=1: 30.59 
                 - HdfsReadThreadConcurrencyCountPercentage=10: 0.00 
                 - HdfsReadThreadConcurrencyCountPercentage=11: 0.00 
                 - HdfsReadThreadConcurrencyCountPercentage=12: 0.00 
                 - HdfsReadThreadConcurrencyCountPercentage=2: 9.21 
                 - HdfsReadThreadConcurrencyCountPercentage=3: 2.33 
                 - HdfsReadThreadConcurrencyCountPercentage=4: 0.76 
                 - HdfsReadThreadConcurrencyCountPercentage=5: 0.31 
                 - HdfsReadThreadConcurrencyCountPercentage=6: 0.08 
                 - HdfsReadThreadConcurrencyCountPercentage=7: 0.04 
                 - HdfsReadThreadConcurrencyCountPercentage=8: 0.01 
                 - HdfsReadThreadConcurrencyCountPercentage=9: 0.00 
               - AverageScannerThreadConcurrency: 1.47 
               - BytesRead: 97.49 GB
               - DecompressionTime: 8m10s
               - MemoryUsed: 0.00 
               - NumDisksAccessed: 11
               - PerReadThreadRawHdfsThroughput: 109.55 MB/sec
               - RowsReturned: 5.92B (5915604448)
               - RowsReturnedRate: 6.76 M/sec
               - ScanRangesComplete: 934
               - ScannerThreadsInvoluntaryContextSwitches: 256.99K (256986)
               - ScannerThreadsTotalWallClockTime: 202h50m
                 - MaterializeTupleTime: 0ns
                 - ScannerThreadsSysTime: 22s143ms
                 - ScannerThreadsUserTime: 35m34s
               - ScannerThreadsVoluntaryContextSwitches: 202.23K (202226)
               - TotalRawHdfsReadTime: 15m29s
               - TotalReadThroughput: 66.38 MB/sec
      
      

      Attachments

        Activity

          People

            nong_impala_60e1 Nong Li
            lskuff Lenni Kuff
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: