Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.7.0
-
None
Description
KuduScanBatch::RowPtr::Get
gets called by the Impala scanner per row, per column and RowPtr::Get has check on type which not necessarily needed, this burns about 10% of the overall CPU.
Also similar to IMPALA-3347, it would help if the Batch has a column level nullability bit to skip the NULL check.
template<typename T> Status KuduScanBatch::RowPtr::Get(int col_idx, typename T::cpp_type* val) const { const ColumnSchema& col = schema_->column(col_idx); if (PREDICT_FALSE(col.type_info()->type() != T::type)) { // TODO: at some point we could allow type coercion here. return Status::InvalidArgument( Substitute("invalid type $0 provided for column '$1' (expected $2)", T::name(), col.name(), col.type_info()->name())); } if (col.is_nullable() && IsNull(col_idx)) { return Status::NotFound("column is NULL"); } memcpy(val, row_data_ + schema_->column_offset(col_idx), sizeof(*val)); return Status::OK(); }
From perf
Samples: 182K of event 'cycles', Event count (approx.): 65636591846 11.59% libkudu_client.so.0.1.0 [.] kudu::client::KuduScanBatch::RowPtr::GetInt64(int, long*) const 10.03% impalad [.] impala::KuduScanner::KuduRowToImpalaTuple(kudu::client::KuduScanBatch::RowPtr const&, impala::RowBatch*, impala::Tuple*) 7.28% impalad [.] impala::KuduScanner::DecodeRowsIntoRowBatch(impala::RowBatch*, impala::Tuple**, bool*) 3.69% [kernel] [k] copy_user_enhanced_fast_string 3.26% kudu-tserver [.] kudu::SerializeRowBlock(kudu::RowBlock const&, kudu::RowwiseRowBlockPB*, kudu::Schema const*, kudu::faststring*, kudu::faststring*) 3.01% libkudu_client.so.0.1.0 [.] kudu::client::KuduScanBatch::RowPtr::IsNull(int) const
And this shows that most of the cost is from the check before the memcpy
kudu::client::KuduScanBatch::RowPtr::GetInt64(int, long*) const /opt/cloudera/parcels/KUDU-0.7.1-1.kudu0.7.1.p0.36/lib64/libkudu_client.so.0.1.0 2.37 │ mov %rdi,%r12 0.42 │ sub $0xd8,%rsp 2.62 │ mov (%rsi),%r8 2.37 │ lea (%rbx,%rbx,8),%rsi 0.46 │ mov (%r8),%rax 0.61 │ lea (%rax,%rsi,8),%r9 2.76 │ mov 0x8(%r9),%rax 6.81 │ cmpl $0x7,(%rax) 12.14 │ ↓ jne e8 2.57 │ cmpb $0x0,0x10(%r9) 0.57 │ ↓ jne 68 1.95 │ 3b: mov 0x40(%r8),%rax 0.68 │ mov (%rax,%rbx,8),%rax 2.92 │ add 0x8(%r13),%rax 0.75 │ mov (%rax),%rax 32.71 │ mov %rax,(%rcx) 2.62 │ movq $0x0,(%r12) 0.52 │ 55: lea -0x18(%rbp),%rsp 0.21 │ mov %r12,%rax 2.13 │ pop %rbx 2.48 │ pop %r12 0.52 │ pop %r13 0.22 │ pop %rbp 2.46 │ ← retq │ nop │ 68: mov %edx,%esi │ mov %r13,%rdi │ mov %rcx,-0xe8(%rbp) │ → callq kudu::client::KuduScanBatch::RowPtr::IsNull(int) const@plt │ test %al,%al