Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4511

Regression : Reported Fragment total time is incorrect

    Details

      Description

      A recent change seems to have broken the fragment reported timings

          Averaged Fragment F00:(Total: 43.874ms, non-child: 0.000ns, % non-child: 0.00%)
            split sizes:  min: 7.88 GB, max: 7.95 GB, avg: 7.92 GB, stddev: 21.60 MB
            completion times: min:19s509ms  max:19s883ms  mean: 19s720ms  stddev:112.439ms
            execution rates: min:407.03 MB/sec  max:416.52 MB/sec  mean:411.07 MB/sec  stddev:2.89 MB/sec
            num instances: 15
             - AverageThreadTokens: 11.32 
             - BloomFilterBytes: 17.00 MB (17825792)
             - PeakMemoryUsage: 123.02 MB (128992933)
             - PerHostPeakMemUsage: 844.39 MB (885405464)
             - RowsProduced: 6.75M (6745672)
             - TotalCpuTime: 3m25s
             - TotalNetworkReceiveTime: 0.000ns
             - TotalNetworkSendTime: 17s164ms
             - TotalStorageWaitTime: 394.254ms
            Fragment Instance Lifecycle Timings:
               - ExecTime: 19s670ms
                 - ExecTreeExecTime: 1s034ms
               - OpenTime: 17.251ms
                 - ExecTreeOpenTime: 142.269us
               - PrepareTime: 26.600ms
                 - ExecTreePrepareTime: 20.760ms
      

      Before the regression

          Averaged Fragment F00:(Total: 29s527ms, non-child: 3s362ms, % non-child: 11.39%)
            split sizes:  min: 11.46 GB, max: 12.01 GB, avg: 11.77 GB, stddev: 124.72 MB
            completion times: min:17s281ms  max:46s307ms  mean: 30s414ms  stddev:7s903ms
            execution rates: min:255.29 MB/sec  max:699.00 MB/sec  mean:426.25 MB/sec  stddev:121.13 MB/sec
            num instances: 32
             - AverageThreadTokens: 3.86 
             - BloomFilterBytes: 27.00 MB (28311552)
             - PeakMemoryUsage: 219.11 MB (229755142)
             - PerHostPeakMemUsage: 1.48 GB (1588976755)
             - PrepareTime: 171.122ms
             - RowsProduced: 7.59M (7590805)
             - TotalCpuTime: 1m42s
             - TotalNetworkReceiveTime: 16s432ms
             - TotalNetworkSendTime: 236.902ms
             - TotalStorageWaitTime: 499.184ms
      
      1. TableSinkProfile.txt
        66 kB
        Mostafa Mokhtar
      2. TPCH-Q3.txt
        1.46 MB
        Mostafa Mokhtar

        Activity

        Hide
        dhecht Dan Hecht added a comment -

        Mostafa Mokhtar what's the (last good githash, first bad githash) pair you know of (to help narrow down the change that broke it)?

        Show
        dhecht Dan Hecht added a comment - Mostafa Mokhtar what's the (last good githash, first bad githash) pair you know of (to help narrow down the change that broke it)?
        Show
        mmokhtar Mostafa Mokhtar added a comment - Dan Hecht The regression was introduced between http://github.mtv.cloudera.com/CDH/Impala/commit/6117e0452bdad15c45a18a1f401a63990b96b05d and http://github.mtv.cloudera.com/CDH/Impala/commit/a441e6cc46ebe65a42fef02f3de7b96f296aa4a4 My money is on Henry Robinson http://github.mtv.cloudera.com/CDH/Impala/commit/a82b0575468cfb34bdbadddcd6704d37c630d540
        Hide
        dhecht Dan Hecht added a comment -
        * a441e6c (internal/cdh5-trunk) IMPALA-4299: add buildall.sh option to start test cluster (by Tim Armstrong 5 weeks ago)
        * 33f799b IMPALA-4270: Gracefully fail unsupported queries with mt_dop > 0. (by Alex Behm 6 weeks ago)
        * f0ab9aa IMPALA-2789: More compact mem layout with null bits at the end. (by Alex Behm 11 months ago)
        * a82b057 IMPALA-2905: Handle coordinator fragment lifecycle like all others (by Henry Robinson 6 weeks ago)
        * ba788c6 IMPALA-4294: Make check-schema-diff.sh executable from anywhere. (by David Knupp 5 weeks ago)
        * c6bd992 IMPALA-3348: Avoid per-slot check vector size in KuduScanner (by Matthew Jacobs 5 months ago)
        * 9a8bc46 IMPALA-3002/IMPALA-1473: Cardinality observability cleanup (by Thomas Tauber-Marshall 6 weeks ago)
        * f7cc3d1 IMPALA-4102: Remote Kudu reads should be reported (by Matthew Jacobs 2 months ago)
        * 9c2a1bb IMPALA-3644 Make predicate order deterministic (by Lars Volker 6 weeks ago)
        

        Yeah, seems likely. Henry Robinson, could you take a look?

        Show
        dhecht Dan Hecht added a comment - * a441e6c (internal/cdh5-trunk) IMPALA-4299: add buildall.sh option to start test cluster (by Tim Armstrong 5 weeks ago) * 33f799b IMPALA-4270: Gracefully fail unsupported queries with mt_dop > 0. (by Alex Behm 6 weeks ago) * f0ab9aa IMPALA-2789: More compact mem layout with null bits at the end. (by Alex Behm 11 months ago) * a82b057 IMPALA-2905: Handle coordinator fragment lifecycle like all others (by Henry Robinson 6 weeks ago) * ba788c6 IMPALA-4294: Make check-schema-diff.sh executable from anywhere. (by David Knupp 5 weeks ago) * c6bd992 IMPALA-3348: Avoid per-slot check vector size in KuduScanner (by Matthew Jacobs 5 months ago) * 9a8bc46 IMPALA-3002/IMPALA-1473: Cardinality observability cleanup (by Thomas Tauber-Marshall 6 weeks ago) * f7cc3d1 IMPALA-4102: Remote Kudu reads should be reported (by Matthew Jacobs 2 months ago) * 9c2a1bb IMPALA-3644 Make predicate order deterministic (by Lars Volker 6 weeks ago) Yeah, seems likely. Henry Robinson , could you take a look?
        Hide
        srus Silvius Rus added a comment -
        Show
        srus Silvius Rus added a comment - Mostafa Mokhtar can you please use references to https://github.com/apache/incubator-impala rather than http://github.mtv.cloudera.com/CDH/Impala .
        Show
        henryr Henry Robinson added a comment - https://github.com/apache/incubator-impala/commit/eb0318609b11cba41e78bb105d266046c81cf6c9
        Hide
        mmokhtar Mostafa Mokhtar added a comment -

        Henry Robinson

        The profile timings are still incorrect, check "Total" Vs. min & max timings TPCH-Q3.txt

            Averaged Fragment F04:(Total: 4s675ms, non-child: 0.000ns, % non-child: 0.00%)
              split sizes:  min: 0, max: 0, avg: 0, stddev: 0
              completion times: min:26s648ms  max:26s683ms  mean: 26s658ms  stddev:10.488ms
              execution rates: min:0.00 /sec  max:0.00 /sec  mean:0.00 /sec  stddev:0.00 /sec
              num instances: 15
               - AverageThreadTokens: 1.01 
               - BloomFilterBytes: 4.00 MB (4194304)
               - InvoluntaryContextSwitches: 304 (304)
               - PeakMemoryUsage: 302.81 MB (317516591)
               - PerHostPeakMemUsage: 1.60 GB (1714962635)
               - RowsProduced: 226.27K (226273)
               - TotalNetworkReceiveTime: 25s580ms
               - TotalNetworkSendTime: 3.799ms
               - TotalStorageWaitTime: 0.000ns
               - TotalWallClockTime: 26s741ms
                 - SysTime: 107.849ms
                 - UserTime: 1s190ms
               - VoluntaryContextSwitches: 7.55K (7553)
              Fragment Instance Lifecycle Timings:
                 - ExecTime: 21s981ms
                   - ExecTreeExecTime: 21s873ms
                 - OpenTime: 4s618ms
                   - ExecTreeOpenTime: 4s223ms
                 - PrepareTime: 57.042ms
                   - ExecTreePrepareTime: 875.116us
        
        Show
        mmokhtar Mostafa Mokhtar added a comment - Henry Robinson The profile timings are still incorrect, check "Total" Vs. min & max timings TPCH-Q3.txt Averaged Fragment F04:(Total: 4s675ms, non-child: 0.000ns, % non-child: 0.00%) split sizes: min: 0, max: 0, avg: 0, stddev: 0 completion times: min:26s648ms max:26s683ms mean: 26s658ms stddev:10.488ms execution rates: min:0.00 /sec max:0.00 /sec mean:0.00 /sec stddev:0.00 /sec num instances: 15 - AverageThreadTokens: 1.01 - BloomFilterBytes: 4.00 MB (4194304) - InvoluntaryContextSwitches: 304 (304) - PeakMemoryUsage: 302.81 MB (317516591) - PerHostPeakMemUsage: 1.60 GB (1714962635) - RowsProduced: 226.27K (226273) - TotalNetworkReceiveTime: 25s580ms - TotalNetworkSendTime: 3.799ms - TotalStorageWaitTime: 0.000ns - TotalWallClockTime: 26s741ms - SysTime: 107.849ms - UserTime: 1s190ms - VoluntaryContextSwitches: 7.55K (7553) Fragment Instance Lifecycle Timings: - ExecTime: 21s981ms - ExecTreeExecTime: 21s873ms - OpenTime: 4s618ms - ExecTreeOpenTime: 4s223ms - PrepareTime: 57.042ms - ExecTreePrepareTime: 875.116us
        Hide
        mmokhtar Mostafa Mokhtar added a comment -

        Fragment with table sink has incorrect timings as well. TableSinkProfile.txt

            Averaged Fragment F00:(Total: 38.946ms, non-child: 0.000ns, % non-child: 0.00%)
              split sizes:  min: 39.27 MB, max: 84.67 MB, avg: 47.67 MB, stddev: 15.36 MB
              completion times: min:2562047h47m  max:-9223372036854775808.000ns  mean: -nanns  stddev:0.000ns
              execution rates: min:8589934592.00 GB/sec  max:-9223372036854775808.00 B/sec  mean:-nan B/sec  stddev:0.00 /sec
              num instances: 13
               - AverageThreadTokens: 1.83 
               - BloomFilterBytes: 0
               - InvoluntaryContextSwitches: 397 (397)
               - PeakMemoryUsage: 133.62 MB (140110965)
               - PerHostPeakMemUsage: 133.62 MB (140110965)
               - RowsProduced: 923.08K (923076)
               - TotalNetworkReceiveTime: 0.000ns
               - TotalNetworkSendTime: 0.000ns
               - TotalStorageWaitTime: 53.426ms
               - TotalWallClockTime: 4s353ms
                 - SysTime: 239.962ms
                 - UserTime: 2s314ms
               - VoluntaryContextSwitches: 1.52K (1517)
              Fragment Instance Lifecycle Timings:
                 - ExecTime: 2s294ms
                   - ExecTreeExecTime: 74.614ms
                 - OpenTime: 14.390ms
                   - ExecTreeOpenTime: 37.820us
                 - PrepareTime: 24.490ms
                   - ExecTreePrepareTime: 149.684us
              BlockMgr:
                 - BlockWritesOutstanding: 0 (0)
                 - BlocksCreated: 0 (0)
                 - BlocksRecycled: 0 (0)
                 - BufferedPins: 0 (0)
                 - BytesWritten: 0
                 - MaxBlockSize: 8.00 MB (8388608)
                 - MemoryLimit: 80.00 GB (85899345920)
                 - PeakMemoryUsage: 0
                 - ScratchFileUsedBytes: 0
                 - TotalBufferWaitTime: 0.000ns
                 - TotalEncryptionTime: 0.000ns
                 - TotalReadBlockTime: 0.000ns
              HdfsTableSink:(Total: 2s169ms, non-child: 2s169ms, % non-child: 100.00%)
                 - BytesWritten: 47.21 MB (49505546)
                 - CompressTimer: 260.018ms
                 - EncodeTimer: 1s680ms
                 - FilesCreated: 1 (1)
                 - FinalizePartitionFileTimer: 445.748ms
                 - HdfsWriteTimer: 253.939ms
                 - PartitionsCreated: 1 (1)
                 - PeakMemoryUsage: 60.00 MB (62910135)
                 - RowsInserted: 923.08K (923076)
                 - TmpFileCreateTimer: 41.257ms
        
        Show
        mmokhtar Mostafa Mokhtar added a comment - Fragment with table sink has incorrect timings as well. TableSinkProfile.txt Averaged Fragment F00:(Total: 38.946ms, non-child: 0.000ns, % non-child: 0.00%) split sizes: min: 39.27 MB, max: 84.67 MB, avg: 47.67 MB, stddev: 15.36 MB completion times: min:2562047h47m max:-9223372036854775808.000ns mean: -nanns stddev:0.000ns execution rates: min:8589934592.00 GB/sec max:-9223372036854775808.00 B/sec mean:-nan B/sec stddev:0.00 /sec num instances: 13 - AverageThreadTokens: 1.83 - BloomFilterBytes: 0 - InvoluntaryContextSwitches: 397 (397) - PeakMemoryUsage: 133.62 MB (140110965) - PerHostPeakMemUsage: 133.62 MB (140110965) - RowsProduced: 923.08K (923076) - TotalNetworkReceiveTime: 0.000ns - TotalNetworkSendTime: 0.000ns - TotalStorageWaitTime: 53.426ms - TotalWallClockTime: 4s353ms - SysTime: 239.962ms - UserTime: 2s314ms - VoluntaryContextSwitches: 1.52K (1517) Fragment Instance Lifecycle Timings: - ExecTime: 2s294ms - ExecTreeExecTime: 74.614ms - OpenTime: 14.390ms - ExecTreeOpenTime: 37.820us - PrepareTime: 24.490ms - ExecTreePrepareTime: 149.684us BlockMgr: - BlockWritesOutstanding: 0 (0) - BlocksCreated: 0 (0) - BlocksRecycled: 0 (0) - BufferedPins: 0 (0) - BytesWritten: 0 - MaxBlockSize: 8.00 MB (8388608) - MemoryLimit: 80.00 GB (85899345920) - PeakMemoryUsage: 0 - ScratchFileUsedBytes: 0 - TotalBufferWaitTime: 0.000ns - TotalEncryptionTime: 0.000ns - TotalReadBlockTime: 0.000ns HdfsTableSink:(Total: 2s169ms, non-child: 2s169ms, % non-child: 100.00%) - BytesWritten: 47.21 MB (49505546) - CompressTimer: 260.018ms - EncodeTimer: 1s680ms - FilesCreated: 1 (1) - FinalizePartitionFileTimer: 445.748ms - HdfsWriteTimer: 253.939ms - PartitionsCreated: 1 (1) - PeakMemoryUsage: 60.00 MB (62910135) - RowsInserted: 923.08K (923076) - TmpFileCreateTimer: 41.257ms
        Hide
        henryr Henry Robinson added a comment -

        Mostafa Mokhtar - the TPCH-Q3 example you posted looks very much like the issue that was already fixed. We do have tests that should trigger if the total time is less than the exec + open + prepare time on every query that gets run in the test suite, so I'd be surprised to see it regress as significantly as this without the tests.

        The sink timings is a different issue. I'll try to look into that today or tomorrow.

        Show
        henryr Henry Robinson added a comment - Mostafa Mokhtar - the TPCH-Q3 example you posted looks very much like the issue that was already fixed. We do have tests that should trigger if the total time is less than the exec + open + prepare time on every query that gets run in the test suite, so I'd be surprised to see it regress as significantly as this without the tests. The sink timings is a different issue. I'll try to look into that today or tomorrow.
        Hide
        henryr Henry Robinson added a comment -

        I've filed IMPALA-4608 for the sink timings issue. I'm going to close this in the next day unless I hear that your build with the wrong total timings definitely included the patch for this issue.

        Show
        henryr Henry Robinson added a comment - I've filed IMPALA-4608 for the sink timings issue. I'm going to close this in the next day unless I hear that your build with the wrong total timings definitely included the patch for this issue.
        Hide
        mmokhtar Mostafa Mokhtar added a comment -
        Show
        mmokhtar Mostafa Mokhtar added a comment - Henry Robinson Below is the last commit used for Q3 profile which is after http://github.mtv.cloudera.com/CDH/Impala/commit/f6f910016b442fd621cde4a536c7ce61b02897e0
        Hide
        henryr Henry Robinson added a comment -

        IMPALA-4613 is the new regression.

        Show
        henryr Henry Robinson added a comment - IMPALA-4613 is the new regression.

          People

          • Assignee:
            henryr Henry Robinson
            Reporter:
            mmokhtar Mostafa Mokhtar
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development