Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-789

Runtime profile average profiles are not computed correctly.

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • Impala 1.3
    • Impala 1.3
    • None
    • None

    Description

      The average times look messed up. They are way higher than the actual individual fragment times and
      it looks we are just summing the values instead of averaging them.

      From the 10 node cluster:

      Averaged Fragment 2:(Active: 20m52s, % non-child: 0.00%)
            split sizes:  min: 7.24 GB, max: 7.44 GB, avg: 7.36 GB, stddev: 50.58 MB
            completion times: min:6s010ms  max:11m7s  mean: 5m21s  stddev:3m2s
            execution rates: min:11.20 MB/sec  max:1.24 GB/sec  mean:58.45 MB/sec  stddev:123.35 MB/sec
            num instances: 10
             - AverageThreadTokens: 226.98 
             - PeakMemoryUsage: 10.65 GB
             - PrepareTime: 6s189ms
             - RowsProduced: 8
             - TotalCpuTime: 39m52s
             - TotalNetworkWaitTime: 3.736ms
             - TotalStorageWaitTime: 41m49s
            CodeGen:(Active: 1m22s, % non-child: 0.00%)
               - CodegenTime: 5s887ms
               - CompileTime: 1m20s
               - LoadTime: 1s859ms
               - ModuleFileSize: 12.00 MB
            DataStreamSender (dst_id=2):(Active: 92.836ms, % non-child: 0.00%)
               - BytesSent: 724.00 B
               - NetworkThroughput(*): 569.90 KB/sec
               - OverallThroughput: 5.58 MB/sec
               - PeakMemoryUsage: 1.30 MB
               - SerializeBatchTime: 131.735us
               - ThriftTransmitTime(*): 3.255ms
               - UncompressedRowBatchSize: 976.00 B
            AGGREGATION_NODE (id=1):(Active: 20m55s, % non-child: 0.00%)
               - BuildBuckets: 116.53K (116531)
               - BuildTime: 16m45s
               - GetResultsTime: 49.275us
               - LoadFactor: 0.44 
               - PeakMemoryUsage: 945.14 MB
               - RowsReturned: 8
               - RowsReturnedRate: 0
            HDFS_SCAN_NODE (id=0):(Active: 9h51m, % non-child: 0.00%)                <-- This is way too high. This has always been wallclock time.
               - AverageHdfsReadThreadConcurrency: 3.78 
               - AverageScannerThreadConcurrency: 113.80 
               - BytesRead: 427.60 GB
               - BytesReadLocal: 14.71 GB
               - BytesReadShortCircuit: 14.71 GB
               - NumDisksAccessed: 22
               - NumScannerThreadsStarted: 113
               - PeakMemoryUsage: 10.65 GB
               - PerReadThreadRawHdfsThroughput: 43.58 GB/sec
               - RowsRead: 3.46B (3455736751)
               - RowsReturned: 3.41B (3407086725)
               - RowsReturnedRate: 10.85 M/sec
               - ScanRangesComplete: 3.34K (3340)
               - ScannerThreadsInvoluntaryContextSwitches: 273.82K (273818)
               - ScannerThreadsTotalWallClockTime: 20m52s
                 - DelimiterParseTime: 3h2m
                 - MaterializeTupleTime(*): 1h27m
                 - ScannerThreadsSysTime: 1s367ms
                 - ScannerThreadsUserTime: 5m43s
               - ScannerThreadsVoluntaryContextSwitches: 632
               - TotalRawHdfsReadTime(*): 19m8s
               - TotalReadThroughput: 1.33 GB/sec
      
      Fragment 2:                                                                       <--- This is the slowest fragment 2 instance.
            Instance a34348c0c6c35c5f:35890889650ed2a0 (host=c2308.halxg.cloudera.com:22000):(Active: 11m7s, % non-child: 0.00%)
              Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:7/896.00 MB 1:3/384.00 MB 2:6/768.00 MB 3:4/512.00 MB 4:5/640.00 MB 5:10/1.25 GB 6:7/896.00 MB 7:7/896.00 MB 8:2/256.00 MB 9:3/304.72 MB 10:5/640.00 MB 
              MemoryUsage(16s000ms): 18.40 MB, 56.16 MB, 52.91 MB, 41.16 MB, 56.66 MB, 45.66 MB, 47.66 MB, 42.41 MB, 34.66 MB, 55.16 MB, 46.91 MB, 56.91 MB, 50.16 MB, 52.41 MB, 50.41 MB, 43.85 MB, 39.44 MB, 49.16 MB, 52.41 MB, 51.41 MB, 50.16 MB, 57.41 MB, 43.91 MB, 44.92 MB, 54.16 MB, 41.41 MB, 54.16 MB, 30.41 MB, 28.16 MB, 36.91 MB, 47.91 MB, 48.66 MB, 21.91 MB, 30.91 MB, 11.66 MB, 37.91 MB, 27.41 MB, 49.91 MB, 43.16 MB, 8.16 MB, 18.91 MB
              ThreadUsage(16s000ms): 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
               - AverageThreadTokens: 2.00 
               - PeakMemoryUsage: 120.25 MB
               - PrepareTime: 133.545ms
               - RowsProduced: 4
               - TotalCpuTime: 21m3s
               - TotalNetworkWaitTime: 1.860ms
               - TotalStorageWaitTime: 1m12s
              CodeGen:(Active: 1s306ms, % non-child: 0.20%)
                 - CodegenTime: 113.344ms
                 - CompileTime: 1s275ms
                 - LoadTime: 27.468ms
                 - ModuleFileSize: 107.95 KB
              DataStreamSender (dst_id=2):(Active: 6.278ms, % non-child: 0.00%)
                 - BytesSent: 359.00 B
                 - NetworkThroughput(*): 215.54 KB/sec
                 - OverallThroughput: 55.84 KB/sec
                 - PeakMemoryUsage: 27.41 KB
                 - SerializeBatchTime: 69.127us
                 - ThriftTransmitTime(*): 1.626ms
                 - UncompressedRowBatchSize: 488.00 B
              AGGREGATION_NODE (id=1):(Active: 11m7s, % non-child: 2.61%)
                ExecOption: Codegen Enabled
                 - BuildBuckets: 1.02K (1024)
                 - BuildTime: 17s213ms
                 - GetResultsTime: 25.631us
                 - LoadFactor: 0.00 
                 - PeakMemoryUsage: 8.31 MB
                 - RowsReturned: 4
                 - RowsReturnedRate: 0
              HDFS_SCAN_NODE (id=0):(Active: 10m49s, % non-child: 97.39%)               <--- This number looks closer to reasonable.
                File Formats: TEXT/NONE:59 
                ExecOption: Codegen enabled: 59 out of 59
                 - AverageHdfsReadThreadConcurrency: 0.03 
                 - AverageScannerThreadConcurrency: 1.00 
                 - BytesRead: 7.30 GB
                 - BytesReadLocal: 7.30 GB
                 - BytesReadShortCircuit: 7.30 GB
                 - NumDisksAccessed: 11
                 - NumScannerThreadsStarted: 1
                 - PeakMemoryUsage: 120.19 MB
                 - PerReadThreadRawHdfsThroughput: 350.58 MB/sec
                 - RowsRead: 59.52M (59522180)
                 - RowsReturned: 58.69M (58685909)
                 - RowsReturnedRate: 90.31 K/sec
                 - ScanRangesComplete: 59
                 - ScannerThreadsInvoluntaryContextSwitches: 159.04K (159044)
                 - ScannerThreadsTotalWallClockTime: 11m7s
                   - DelimiterParseTime: 3m39s
                   - MaterializeTupleTime(*): 1m47s
                   - ScannerThreadsSysTime: 883.865ms
                   - ScannerThreadsUserTime: 2m50s
                 - ScannerThreadsVoluntaryContextSwitches: 353
                 - TotalRawHdfsReadTime(*): 21s315ms
                 - TotalReadThroughput: 11.19 MB/sec
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            srinathshankar Srinath
            nong_impala_60e1 Nong Li
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment