Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4026

Investigate regression introduced by "IMPALA-3629: Codegen TransferScratchTuples"

    XMLWordPrintableJSON

Details

    Description

      IMPALA-3629 seems to have introduced a regression in https://github.com/apache/incubator-impala/blob/master/testdata/workloads/targeted-perf/queries/primitive_filter_bigint_non_selective.test.
      Interestingly of the 6 scan primitives this is the only one which regressed.

      Scan node with change

        HDFS_SCAN_NODE (id=0):(Total: 1s173ms, non-child: 1s173ms, % non-child: 100.00%)
                ExecOption: Expr Evaluation Codegen Enabled, PARQUET Codegen Enabled, Codegen enabled: 18 out of 18
                Hdfs split stats (<volume id>:<# splits>/<split lengths>): 8:1/253.41 MB 4:2/506.80 MB 5:2/506.78 MB 7:1/253.41 MB 3:2/506.77 MB 0:2/506.77 MB 2:2/506.79 MB 1:2/506.82 MB 10:1/253.40 MB 9:2/506.79 MB 6:1/253.39 MB 
                Hdfs Read Thread Concurrency Bucket: 0:100% 1:0% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% 8:0% 9:0% 10:0% 11:0% 12:0% 13:0% 14:0% 
                File Formats: PARQUET/SNAPPY:18 
                BytesRead(500.000ms): 116.63 MB, 116.63 MB, 190.83 MB
                 - AverageHdfsReadThreadConcurrency: 0.00 
                 - AverageScannerThreadConcurrency: 9.67 
                 - BytesRead: 190.83 MB (200101412)
                 - BytesReadDataNodeCache: 0
                 - BytesReadLocal: 190.83 MB (200101412)
                 - BytesReadRemoteUnexpected: 0
                 - BytesReadShortCircuit: 190.83 MB (200101412)
                 - DecompressionTime: 1s618ms
                 - MaxCompressedTextFileLength: 0
                 - NumColumns: 1 (1)
                 - NumDisksAccessed: 11 (11)
                 - NumRowGroups: 18 (18)
                 - NumScannerThreadsStarted: 11 (11)
                 - PeakMemoryUsage: 141.30 MB (148162336)
                 - PerReadThreadRawHdfsThroughput: 1.22 GB/sec
                 - RemoteScanRanges: 0 (0)
                 - RowsRead: 124.58M (124581309)
                 - RowsReturned: 124.58M (124581309)
                 - RowsReturnedRate: 106.17 M/sec
                 - ScanRangesComplete: 18 (18)
                 - ScannerThreadsInvoluntaryContextSwitches: 1.39K (1386)
                 - ScannerThreadsTotalWallClockTime: 14s246ms
                   - MaterializeTupleTime(*): 2s804ms
                   - ScannerThreadsSysTime: 650.896ms
                   - ScannerThreadsUserTime: 5s693ms
                 - ScannerThreadsVoluntaryContextSwitches: 124.53K (124533)
                 - TotalRawHdfsReadTime(*): 152.965ms
                 - TotalReadThroughput: 127.22 MB/sec
      

      Scan node without change

       HDFS_SCAN_NODE (id=0):(Total: 559.157ms, non-child: 559.157ms, % non-child: 100.00%)
                ExecOption: Expr Evaluation Codegen Enabled, Codegen enabled: 0 out of 17
                Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/253.39 MB 6:1/253.39 MB 10:1/253.38 MB 9:2/506.79 MB 7:2/506.79 MB 8:1/253.39 MB 3:2/506.77 MB 5:1/253.40 MB 1:3/760.22 MB 4:3/760.15 MB 
                Hdfs Read Thread Concurrency Bucket: 0:100% 1:0% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% 8:0% 9:0% 10:0% 11:0% 12:0% 13:0% 14:0% 
                File Formats: PARQUET/SNAPPY:17 
                BytesRead(500.000ms): 116.63 MB
                 - AverageHdfsReadThreadConcurrency: 0.00 
                 - AverageScannerThreadConcurrency: 11.00 
                 - BytesRead: 180.22 MB (188973810)
                 - BytesReadDataNodeCache: 0
                 - BytesReadLocal: 180.22 MB (188973810)
                 - BytesReadRemoteUnexpected: 0
                 - BytesReadShortCircuit: 180.22 MB (188973810)
                 - DecompressionTime: 1s011ms
                 - MaxCompressedTextFileLength: 0
                 - NumColumns: 1 (1)
                 - NumDisksAccessed: 10 (10)
                 - NumRowGroups: 17 (17)
                 - NumScannerThreadsStarted: 11 (11)
                 - PeakMemoryUsage: 143.67 MB (150652704)
                 - PerReadThreadRawHdfsThroughput: 1.45 GB/sec
                 - RemoteScanRanges: 0 (0)
                 - RowsRead: 117.66M (117660198)
                 - RowsReturned: 117.66M (117660198)
                 - RowsReturnedRate: 210.42 M/sec
                 - ScanRangesComplete: 17 (17)
                 - ScannerThreadsInvoluntaryContextSwitches: 746 (746)
                 - ScannerThreadsTotalWallClockTime: 7s055ms
                   - MaterializeTupleTime(*): 3s150ms
                   - ScannerThreadsSysTime: 202.962ms
                   - ScannerThreadsUserTime: 4s693ms
                 - ScannerThreadsVoluntaryContextSwitches: 48.68K (48680)
                 - TotalRawHdfsReadTime(*): 121.070ms
                 - TotalReadThroughput: 233.25 MB/sec
      

      Attachments

        1. primitive_filter_bigint_non_selective.txt
          234 kB
          Mostafa Mokhtar
        2. filter_bigint_non_selective_post_1137_IMPALA-3629_3x_1.zip
          2.69 MB
          Mostafa Mokhtar
        3. benchmark_report_full.txt
          47 kB
          Michael Ho
        4. primitive_filter_bigint_non_selective.txt
          237 kB
          Jim Apple

        Issue Links

          Activity

            People

              kwho Michael Ho
              mmokhtar Mostafa Mokhtar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: