Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3346

Kudu scanner : Improve perf of DecodeRowsIntoRowBatch

    Details

      Description

      For kudu scans that return a lot of rows KuduScanner::DecodeRowsIntoRowBatch gets fairly expensive as it populate RowBatch one slot at a time opposed to column by column.

       for (int krow_idx = rows_scanned_current_block_; krow_idx < num_rows; ++krow_idx) {
          // Clear any NULL indicators set by a previous iteration.
          (*tuple_mem)->Init(tuple_num_null_bytes_);
      
          // Transform a Kudu row into an Impala row.
          KuduScanBatch::RowPtr krow = cur_kudu_batch_.Row(krow_idx);
          RETURN_IF_ERROR(KuduRowToImpalaTuple(krow, row_batch, *tuple_mem));
          ++rows_scanned_current_block_;
      

      Then KuduRowToImpalaTuple ends up going through the case statement per row per column.

      Status KuduScanner::KuduRowToImpalaTuple(const KuduScanBatch::RowPtr& row,
          RowBatch* row_batch, Tuple* tuple) {
        for (int i = 0; i < scan_node_->tuple_desc_->slots().size(); ++i) {
          const SlotDescriptor* info = scan_node_->tuple_desc_->slots()[i];
          void* slot = tuple->GetSlot(info->tuple_offset());
      
          if (row.IsNull(i)) {
            SetSlotToNull(tuple, *info);
            continue;
          }
      
          int max_len = -1;
          switch (info->type().type) {
            case TYPE_VARCHAR:
              max_len = info->type().len;
              DCHECK_GT(max_len, 0);
              // Fallthrough intended.
            case TYPE_STRING: {
              // For types with auxiliary memory (String, Binary,...) store the original memory
              // location in the tuple to avoid the copy when the conjuncts do not pass. Relocate
              // the memory into the row batch's memory in a later step.
              kudu::Slice slice;
              KUDU_RETURN_IF_ERROR(row.GetString(i, &slice),
                  "Error getting column value from Kudu.");
              StringValue* sv = reinterpret_cast<StringValue*>(slot);
              sv->ptr = const_cast<char*>(reinterpret_cast<const char*>(slice.data()));
              sv->len = static_cast<int>(slice.size());
              if (max_len > 0) sv->len = std::min(sv->len, max_len);
              break;
            }
            case TYPE_TINYINT:
              KUDU_RETURN_IF_ERROR(row.GetInt8(i, reinterpret_cast<int8_t*>(slot)),
                  "Error getting column value from Kudu.");
              break;
      

      Based on Vtune the scanner should be ~20% faster if rows are populate column by column.

      Alexander Behm is doing similar work for the Parquet scanner
      http://github.mtv.cloudera.com/abehm/Impala/commit/1f57ea4555e0fb6e652cd7ea5d15154688912693#diff-41a2b64668ab8fa1e5d059aeeece8e23R310

        Issue Links

          Activity

          Hide
          mmokhtar Mostafa Mokhtar added a comment -
          Show
          mmokhtar Mostafa Mokhtar added a comment - Todd Lipcon FYI
          Hide
          mmokhtar Mostafa Mokhtar added a comment -

          From Vtune

          Function Stack	CPU Time: Total	CPU Time: Self	Module	Function (Full)	Source File	Start Address
          KuduScanner::GetNext	71.5%	0.0%	kudu-scanner.cc	0xc093a0
            KuduScanner::DecodeRowsIntoRowBatch	67.9%	8.6%	kudu-scanner.cc	0xc08c70
              KuduScanner::KuduRowToImpalaTuple	34.8%	13.5%	kudu-scanner.cc	0xc08210
              KuduScanner::RelocateValuesFromKudu	6.1%	5.8%	kudu-scanner.cc	0xc07fc0
              Tuple::Init	5.7%	1.2%		tuple.h	0xc08d10
              Tuple::Init	4.5%	0.2%		tuple.h	0xc08df7
              kudu::client::KuduScanBatch::Row	2.8%	1.3%	scan_batch.cc	0x6d9c0
              ExecNode::ReachedLimit	1.3%	1.3%	exec-node.h	0xc08dd1
              KuduScanner::next_tuple	0.9%	0.9%	kudu-scanner.h	0xc08deb
          

          Query used

          select count(*)
          FROM lineitem
          WHERE l_orderkey > 20
          

          Profile

          Operator       #Hosts   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail        
          --------------------------------------------------------------------------------------------------------
          03:AGGREGATE        1  321.836ms  321.836ms       1           1  48.00 KB        -1.00 B  FINALIZE      
          02:EXCHANGE         1  151.612us  151.612us       9           1         0        -1.00 B  UNPARTITIONED 
          01:AGGREGATE        9       1m6s      1m22s       9           1  20.00 KB       10.00 MB                
          00:SCAN KUDU        9      2m23s      2m48s  90.00B          -1   3.16 MB              0                
          
          Show
          mmokhtar Mostafa Mokhtar added a comment - From Vtune Function Stack CPU Time: Total CPU Time: Self Module Function (Full) Source File Start Address KuduScanner::GetNext 71.5% 0.0% kudu-scanner.cc 0xc093a0 KuduScanner::DecodeRowsIntoRowBatch 67.9% 8.6% kudu-scanner.cc 0xc08c70 KuduScanner::KuduRowToImpalaTuple 34.8% 13.5% kudu-scanner.cc 0xc08210 KuduScanner::RelocateValuesFromKudu 6.1% 5.8% kudu-scanner.cc 0xc07fc0 Tuple::Init 5.7% 1.2% tuple.h 0xc08d10 Tuple::Init 4.5% 0.2% tuple.h 0xc08df7 kudu::client::KuduScanBatch::Row 2.8% 1.3% scan_batch.cc 0x6d9c0 ExecNode::ReachedLimit 1.3% 1.3% exec-node.h 0xc08dd1 KuduScanner::next_tuple 0.9% 0.9% kudu-scanner.h 0xc08deb Query used select count(*) FROM lineitem WHERE l_orderkey > 20 Profile Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail -------------------------------------------------------------------------------------------------------- 03:AGGREGATE 1 321.836ms 321.836ms 1 1 48.00 KB -1.00 B FINALIZE 02:EXCHANGE 1 151.612us 151.612us 9 1 0 -1.00 B UNPARTITIONED 01:AGGREGATE 9 1m6s 1m22s 9 1 20.00 KB 10.00 MB 00:SCAN KUDU 9 2m23s 2m48s 90.00B -1 3.16 MB 0
          Hide
          mmokhtar Mostafa Mokhtar added a comment -

          David Alves
          Can you please share the patch which addressed the kudu -> Impala batch conversion?

          Show
          mmokhtar Mostafa Mokhtar added a comment - David Alves Can you please share the patch which addressed the kudu -> Impala batch conversion?
          Hide
          mjacobs Matthew Jacobs added a comment -

          David Alves I'm coming into this late. What is your patch? I'm trying to understand what work there is here for us.

          Show
          mjacobs Matthew Jacobs added a comment - David Alves I'm coming into this late. What is your patch? I'm trying to understand what work there is here for us.
          Hide
          alex.behm Alexander Behm added a comment -

          commit 9f3f4b713d78e8dcf02c02def447195a04f408e6
          Author: Alex Behm <alex.behm@cloudera.com>
          Date: Mon Oct 24 20:46:34 2016 -0700

          IMPALA-3346: DeepCopy() Kudu rows into Impala tuples.

          Implements additional changes to make the memory layout
          of Kudu rows identical to Impala tuples.
          In particular, Kudu rows allocate a null bit even for
          non-nullable columns, and Impala now does the same
          for Kudu scan tuples.

          This change exploits the now-identical Kudu and Impala
          tuple layouts to avoid the expensive translation.

          Perf: Mostafa reported a 50% efficiency gain on full
          table scans.

          Testing: A private core/hdfs run passed.

          TODO:
          1) Test cases with nullable/nonnullable non-PK slots.
          2) Specify mem layout to client (depends on KUDU-1694)
          3) Avoid mem copies (depends on KUDU-1695)

          Change-Id: Ic911e4eff9fe98bf28d8a1bab5c9d7e9ab66d9cb
          Reviewed-on: http://gerrit.cloudera.org:8080/4862
          Reviewed-by: Dan Hecht <dhecht@cloudera.com>
          Tested-by: Marcel Kornacker <marcel@cloudera.com>

          Show
          alex.behm Alexander Behm added a comment - commit 9f3f4b713d78e8dcf02c02def447195a04f408e6 Author: Alex Behm <alex.behm@cloudera.com> Date: Mon Oct 24 20:46:34 2016 -0700 IMPALA-3346 : DeepCopy() Kudu rows into Impala tuples. Implements additional changes to make the memory layout of Kudu rows identical to Impala tuples. In particular, Kudu rows allocate a null bit even for non-nullable columns, and Impala now does the same for Kudu scan tuples. This change exploits the now-identical Kudu and Impala tuple layouts to avoid the expensive translation. Perf: Mostafa reported a 50% efficiency gain on full table scans. Testing: A private core/hdfs run passed. TODO: 1) Test cases with nullable/nonnullable non-PK slots. 2) Specify mem layout to client (depends on KUDU-1694 ) 3) Avoid mem copies (depends on KUDU-1695 ) Change-Id: Ic911e4eff9fe98bf28d8a1bab5c9d7e9ab66d9cb Reviewed-on: http://gerrit.cloudera.org:8080/4862 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Marcel Kornacker <marcel@cloudera.com>

            People

            • Assignee:
              alex.behm Alexander Behm
              Reporter:
              mmokhtar Mostafa Mokhtar
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development