Description
RowReaderImpl::computeBatchSize() can be the hot path when sargs exists. The following perf report shows that orc::RowReaderImpl::next() itself takes 1/4 of the scan time. It's measured using orc-scan with sargs "inv_quantity_on_hand between -1 and 5000" scanning 4 orc files of TPCDS-inventory table (768.23MB in total size).
Looking into the disassembly of it, the time is taken by a loop:
The annotation indicates it's the inlined RowReaderImpl::computeBatchSize() method. Disassembly codes:
│ d0:┌─→mov %r14,%r15 0.36 │ │ mov %esi,%ecx 0.13 │ │ shr $0x6,%rdx 22.81 │ │ shl %cl,%r15 24.24 │ │ test %r15,(%r9,%rdx,8) │ │↓ je fb │ e2:│ lea 0x1(%rsi),%edx 0.22 │ │ mov %r10,%rax 0.18 │ │ imul %rdx,%rax 25.31 │ │ mov %rdx,%rsi │ │ cmp %rdi,%rax 0.54 │ │ cmova %rdi,%rax 0.04 │ ├──cmp %r11,%rdx 23.79 │ └──jb d0 0.31 │ fb: sub %r8,%rax
The corresponding loop:
endRowInStripe = currentRowInStripe; uint32_t rg = static_cast<uint32_t>(currentRowInStripe / rowIndexStride); for (; rg < includedRowGroups.size(); ++rg) { if (!includedRowGroups[rg]) { break; } else { endRowInStripe = std::min(rowsInCurrentStripe, (rg + 1) * rowIndexStride); } }
Attachments
Attachments
Issue Links
- links to