Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4719

what's the difference between kudu and parquet while querying 'query7.sql' of tpcds?

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Bug
    • Kudu_Impala
    • Product Backlog
    • Backend
    • None
    • impala_kudu: 2.8.0
      kudu: 1.0.1

    Description

      hi everybody,
      i am testing impala&kudu and impala&parquet to get the benchmark by tpcds. the result is not perfect. i pick one query (query7.sql) to get profiles that are in the attachement. i notice some difference but don't know why, could anybody give me some tips? thanks in advance.

      Line2233     KUDU_SCAN_NODE (id=0):(Total: 4m58s, non-child: 4m58s, % non-child: 100.00%)
                BytesRead(8s000ms): 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
                 - BytesRead: 0
                 - KuduRemoteScanTokens: 0 (0)
                 - NumScannerThreadsStarted: 2 (2)
                 - PeakMemoryUsage: 1.62 MB (1695744)
      {color:red}           - RowsRead: 288.01M (288006145)
                 - RowsReturned: 288.01M (288006145)
                 - RowsReturnedRate: 964.33 K/sec{color}
                 - ScanRangesComplete: 2 (2)
                 - ScannerThreadsInvoluntaryContextSwitches: 15.56K (15562)
                 - ScannerThreadsTotalWallClockTime: 10m48s
                   - MaterializeTupleTime(*): 4m58s
                   - ScannerThreadsSysTime: 10s600ms
                   - ScannerThreadsUserTime: 1m34s
                   - TotalKuduReadTime: 9m6s
                 - ScannerThreadsVoluntaryContextSwitches: 18.48K (18477)
                 - TotalKuduScanRoundTrips: 16.74K (16743)
                 - TotalReadThroughput: 0.00 /sec
      
      
      Line6407     HDFS_SCAN_NODE (id=0):(Total: 1s983ms, non-child: 1s983ms, % non-child: 100.00%)
                ExecOption: Expr Evaluation Codegen Disabled, PARQUET Codegen Enabled, Codegen enabled: 27 out of 27
                Hdfs split stats (<volume id>:<# splits>/<split lengths>): 8:1/252.88 MB 5:1/252.90 MB 1:2/505.78 MB 0:2/505.76 MB 4:2/505.75 MB 2:3/758.66 MB 10:2/505.76 MB 6:3/691.47 MB 3:4/1011.57 MB 11:2/505.79 MB 7:2/505.74 MB 9:3/758.67 MB 
                Runtime filters: All filters arrived. Waitd 41ms
                Hdfs Read Thread Concurrency Bucket: 0:28.57% 1:0% 2:14.29% 3:0% 4:14.29% 5:0% 6:0% 7:0% 8:0% 9:0% 10:14.29% 11:14.29% 12:14.29% 13:0% 14:0% 15:0% 
                File Formats: PARQUET/SNAPPY:216 
                BytesRead(500.000ms): 0, 0, 258.55 MB, 686.05 MB, 1.08 GB, 1.55 GB, 1.72 GB, 1.87 GB, 1.87 GB
                 - AverageHdfsReadThreadConcurrency: 5.57 
                 - AverageScannerThreadConcurrency: 22.00 
                 - BytesRead: 1.87 GB (2003844897)
                - BytesReadDataNodeCache: 0
                 - BytesReadLocal: 1.87 GB (2003844897)
                 - BytesReadRemoteUnexpected: 0
                 - BytesReadShortCircuit: 0
                 - DecompressionTime: 4s856ms
                 - MaxCompressedTextFileLength: 0
                 - NumColumns: 8 (8)
                 - NumDisksAccessed: 12 (12)
                 - NumRowGroups: 27 (27)
                 - NumScannerThreadsStarted: 27 (27)
                 - PeakMemoryUsage: 1.93 GB (2072175360)
                 - PerReadThreadRawHdfsThroughput: 95.97 MB/sec
                 - RemoteScanRanges: 0 (0)
                 {color:red}- RowsRead: 148.93M (148934063)
                 - RowsReturned: 388.11K (388105)
                 - RowsReturnedRate: 195.71 K/sec{color}
                 - ScanRangesComplete: 27 (27)
                 - ScannerThreadsInvoluntaryContextSwitches: 3.80K (3802)
                 - ScannerThreadsTotalWallClockTime: 1m17s
                   - MaterializeTupleTime(*): 32s642ms
                   - ScannerThreadsSysTime: 2s220ms
                   - ScannerThreadsUserTime: 35s270ms
                 - ScannerThreadsVoluntaryContextSwitches: 2.82K (2820)
                 - TotalRawHdfsReadTime(*): 19s913ms
                 - TotalReadThroughput: 424.67 MB/sec
                {color:red} Filter 0 (1.00 MB):
                   - Rows processed: 442.34K (442341)
                   - Rows rejected: 0 (0)
                   - Rows total: 442.37K (442368)
                Filter 1 (1.00 MB):
                   - Rows processed: 442.34K (442341)
                   - Rows rejected: 21.63K (21630)
                   - Rows total: 442.37K (442368)
                Filter 2 (1.00 MB):
                   - Rows processed: 148.91M (148912433)
                   - Rows rejected: 120.78M (120776101)
                   - Rows total: 148.91M (148912433)
                Filter 3 (1.00 MB):
                   - Rows processed: 28.14M (28136332)
                   - Rows rejected: 27.75M (27748227)
                   - Rows total: 28.14M (28136332) {color}

      Attachments

        1. query7.sql
          0.6 kB
          helifu
        2. impala&parquet.profile
          367 kB
          helifu
        3. impala&kudu.profile
          319 kB
          helifu

        Activity

          People

            Unassigned Unassigned
            helifu_impala_21f6 helifu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: