Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3347

Kudu scanner : Expensive per row per column IsNull check

    Details

      Description

      Kudu should annotate each column in the batch if it is nullable, as today per row per column from a kudu batch the scanner checks if the slot is null, it would be much more efficient to store a per column bit in the KuduScanBatch indicating nullability of a column.

      Status KuduScanner::KuduRowToImpalaTuple(const KuduScanBatch::RowPtr& row,
          RowBatch* row_batch, Tuple* tuple) {
        for (int i = 0; i < scan_node_->tuple_desc_->slots().size(); ++i) {
          const SlotDescriptor* info = scan_node_->tuple_desc_->slots()[i];
          void* slot = tuple->GetSlot(info->tuple_offset());
      
          if (row.IsNull(i)) {
            SetSlotToNull(tuple, *info);
            continue;
          }
      
          int max_len = -1;
          switch (info->type().type) {
            case TYPE_VARCHAR:
              max_len = info->type().len;
              DCHECK_GT(max_len, 0);
      

      For a basic scan null check consumes 4% of the CPU cycles.

        Issue Links

          Activity

          Hide
          mjacobs Matthew Jacobs added a comment -

          Todd Lipcon, for our own planning purposes: is this something you think you'd be able to support on the Kudu side for 1.0?

          Show
          mjacobs Matthew Jacobs added a comment - Todd Lipcon , for our own planning purposes: is this something you think you'd be able to support on the Kudu side for 1.0?
          Hide
          mjacobs Matthew Jacobs added a comment -

          Mostafa Mokhtar, just to be clear: we can know the nullability of a column from the schema. Do you mean that Kudu should indicate in the row batch if any values are actually null? For a non-nullable column, that would always be false, but for a nullable column that would be set if there are nulls or not.

          Show
          mjacobs Matthew Jacobs added a comment - Mostafa Mokhtar , just to be clear: we can know the nullability of a column from the schema. Do you mean that Kudu should indicate in the row batch if any values are actually null? For a non-nullable column, that would always be false, but for a nullable column that would be set if there are nulls or not.
          Hide
          mmokhtar Mostafa Mokhtar added a comment -

          Matthew Jacobs
          Both.
          Null checks in KuduRowToImpalaTuple are very expensive, row.IsNull should be skipped when operating on 1) non-nullable columns 2) Column that doesn't have nulls in the current batch 3) Columns involved in inner non-null safe joins

          Show
          mmokhtar Mostafa Mokhtar added a comment - Matthew Jacobs Both. Null checks in KuduRowToImpalaTuple are very expensive, row.IsNull should be skipped when operating on 1) non-nullable columns 2) Column that doesn't have nulls in the current batch 3) Columns involved in inner non-null safe joins
          Hide
          mjacobs Matthew Jacobs added a comment -

          Ok thanks. For #1 we can do that ourselves, #2 we need support from Kudu. Can you elaborate on #3?

          Show
          mjacobs Matthew Jacobs added a comment - Ok thanks. For #1 we can do that ourselves, #2 we need support from Kudu. Can you elaborate on #3?
          Hide
          mmokhtar Mostafa Mokhtar added a comment -
          Show
          mmokhtar Mostafa Mokhtar added a comment - Matthew Jacobs Referring to IMPALA-2743
          Hide
          mjacobs Matthew Jacobs added a comment -

          This code was removed previously

          Show
          mjacobs Matthew Jacobs added a comment - This code was removed previously
          Hide
          jbapple Jim Apple added a comment -

          This is a bulk comment on all issues with Fix Version 2.8.0 that were resolved on or after 2016-12-09.

          2.8.0 was branched on December 9, with only two changes to master cherry-picked to the 2.8.0 release branch after that:

          https://github.com/apache/incubator-impala/commits/2.8.0

          Issues fixed after December 9 might not be fixed in 2.8.0. If you are the one who marked this issue Resolved, can you check to see if the patch is in 2.8.0 by using the link above? If the patch is not in 2.8.0, can you change the Fix Version to 2.9.0?

          Thank you!

          Show
          jbapple Jim Apple added a comment - This is a bulk comment on all issues with Fix Version 2.8.0 that were resolved on or after 2016-12-09. 2.8.0 was branched on December 9, with only two changes to master cherry-picked to the 2.8.0 release branch after that: https://github.com/apache/incubator-impala/commits/2.8.0 Issues fixed after December 9 might not be fixed in 2.8.0. If you are the one who marked this issue Resolved, can you check to see if the patch is in 2.8.0 by using the link above? If the patch is not in 2.8.0, can you change the Fix Version to 2.9.0? Thank you!

            People

            • Assignee:
              mjacobs Matthew Jacobs
              Reporter:
              mmokhtar Mostafa Mokhtar
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development