Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3347

Kudu scanner : Expensive per row per column IsNull check

    Details

      Description

      Kudu should annotate each column in the batch if it is nullable, as today per row per column from a kudu batch the scanner checks if the slot is null, it would be much more efficient to store a per column bit in the KuduScanBatch indicating nullability of a column.

      Status KuduScanner::KuduRowToImpalaTuple(const KuduScanBatch::RowPtr& row,
          RowBatch* row_batch, Tuple* tuple) {
        for (int i = 0; i < scan_node_->tuple_desc_->slots().size(); ++i) {
          const SlotDescriptor* info = scan_node_->tuple_desc_->slots()[i];
          void* slot = tuple->GetSlot(info->tuple_offset());
      
          if (row.IsNull(i)) {
            SetSlotToNull(tuple, *info);
            continue;
          }
      
          int max_len = -1;
          switch (info->type().type) {
            case TYPE_VARCHAR:
              max_len = info->type().len;
              DCHECK_GT(max_len, 0);
      

      For a basic scan null check consumes 4% of the CPU cycles.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                mjacobs Matthew Jacobs
                Reporter:
                mmokhtar Mostafa Mokhtar
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: