[IMPALA-3346] Kudu scanner : Improve perf of DecodeRowsIntoRowBatch - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: Impala 2.6.0
Fix Version/s: Impala 2.8.0
Component/s: Backend
Labels:
- kudu
- performance

Target Version:

Kudu_Impala

Description

For kudu scans that return a lot of rows KuduScanner::DecodeRowsIntoRowBatch gets fairly expensive as it populate RowBatch one slot at a time opposed to column by column.

 for (int krow_idx = rows_scanned_current_block_; krow_idx < num_rows; ++krow_idx) {
    // Clear any NULL indicators set by a previous iteration.
    (*tuple_mem)->Init(tuple_num_null_bytes_);

    // Transform a Kudu row into an Impala row.
    KuduScanBatch::RowPtr krow = cur_kudu_batch_.Row(krow_idx);
    RETURN_IF_ERROR(KuduRowToImpalaTuple(krow, row_batch, *tuple_mem));
    ++rows_scanned_current_block_;

Then KuduRowToImpalaTuple ends up going through the case statement per row per column.

Status KuduScanner::KuduRowToImpalaTuple(const KuduScanBatch::RowPtr& row,
    RowBatch* row_batch, Tuple* tuple) {
  for (int i = 0; i < scan_node_->tuple_desc_->slots().size(); ++i) {
    const SlotDescriptor* info = scan_node_->tuple_desc_->slots()[i];
    void* slot = tuple->GetSlot(info->tuple_offset());

    if (row.IsNull(i)) {
      SetSlotToNull(tuple, *info);
      continue;
    }

    int max_len = -1;
    switch (info->type().type) {
      case TYPE_VARCHAR:
        max_len = info->type().len;
        DCHECK_GT(max_len, 0);
        // Fallthrough intended.
      case TYPE_STRING: {
        // For types with auxiliary memory (String, Binary,...) store the original memory
        // location in the tuple to avoid the copy when the conjuncts do not pass. Relocate
        // the memory into the row batch's memory in a later step.
        kudu::Slice slice;
        KUDU_RETURN_IF_ERROR(row.GetString(i, &slice),
            "Error getting column value from Kudu.");
        StringValue* sv = reinterpret_cast<StringValue*>(slot);
        sv->ptr = const_cast<char*>(reinterpret_cast<const char*>(slice.data()));
        sv->len = static_cast<int>(slice.size());
        if (max_len > 0) sv->len = std::min(sv->len, max_len);
        break;
      }
      case TYPE_TINYINT:
        KUDU_RETURN_IF_ERROR(row.GetInt8(i, reinterpret_cast<int8_t*>(slot)),
            "Error getting column value from Kudu.");
        break;

Based on Vtune the scanner should be ~20% faster if rows are populate column by column.

alex.behm is doing similar work for the Parquet scanner
http://github.mtv.cloudera.com/abehm/Impala/commit/1f57ea4555e0fb6e652cd7ea5d15154688912693#diff-41a2b64668ab8fa1e5d059aeeece8e23R310

Attachments

Issue Links

is related to

IMPALA-3348 Kudu scanner : Avoid per slot check of Vector size in KuduScanner::RelocateValuesFromKudu

Resolved

IMPALA-3349 Kudu scanner : KuduScanBatch::RowPtr::Get* has very expensive checks in the hot path

Resolved

relates to

IMPALA-3347 Kudu scanner : Expensive per row per column IsNull check

Resolved

Activity

People

Assignee:: Alexander Behm

Reporter:: Mostafa Mokhtar

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 13/Apr/16 22:51

Updated:: 01/Nov/16 20:34

Resolved:: 01/Nov/16 20:34