Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2731

Getting column schema information from KuduSchema requires copying a KuduColumnSchema object

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.9.0
    • Fix Version/s: None
    • Component/s: perf
    • Labels:
      None

      Description

      I'm looking at a CPU profile of Impala inserting into Kudu. KuduTableSink::Send has code that schematically does the following:

      for each row in the batch
        for each column
          if (schema.Column(col_idx).isNullable()) {
            write->mutable_row()->SetNull(col);
          }
        }
      }
      

      See kudu-table-sink.cc. However, KuduSchema::Column copies the column schema and returns it by value, so the if statement constructs and destroys a column schema object just to check if the column is nullable.

      This is by far the biggest user of CPU in the Impala process (35% or so). The workload might be I/O bound writing to Kudu anyway, though. Nevertheless, we should provide a way to avoid this copying in the API, either by adding a method like

      class KuduSchema {
        const KuduColumnSchema& get_column(int idx);
      }
      

      or a method like

      class KuduSchema {
        bool is_column_nullable(int idx);
      }
      

      The former is the most flexible while the latter frees the client from worrying about holding the ref longer than the KuduColumnSchema object lives. We might need to add a number of methods similar to the latter method to cover other potentially useful things like checking encoding, type, etc.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                wdberkeley William Berkeley
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: