Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5217

KuduTableSink checks null constraints incorrectly

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:

      Description

      Something I noticed while looking at kudu-table-sink.cc: we use the referenced_columns map to translate between the index into the output exprs 'j' and the index into columns in the Kudu table 'col', but we incorrectly use 'j' when calling into the Kudu table schema to check the nullability of columns.

        Activity

        Hide
        mjacobs Matthew Jacobs added a comment -

        Nice catch. This seems like this could be serious.

              void* value = output_expr_ctxs_[j]->GetValue(current_row);
              if (value == NULL) {
                if (table_schema.Column(j).is_nullable()) {
                  KUDU_RETURN_IF_ERROR(write->mutable_row()->SetNull(col),
                      "Could not add Kudu WriteOp.");
                  continue;
                } else {
                  // This row violates the nullability constraints of the column, do not attempt
                  // to write this row because it is already known to be an error and the Kudu
                  // error will be difficult to interpret later (error code isn't specific).
                  ++num_null_violations;
                  state->LogError(ErrorMsg::Init(TErrorCode::KUDU_NULL_CONSTRAINT_VIOLATION,
                      table_desc_->table_name()));
                  add_row = false;
                  break; // skip remaining columns for this row
                }
              }
        

        It sounds like one of two wrong things may happen:
        1) for some row, this sets a col to NULL that is actually not nullable, but then l238 should return an error. The query would fail.
        2) for some row, we count a NULL value as a null constraint violation even though it should be set as NULL, then we do not write the row. l248.

        #2 seems serious as we'd start dropping data that we shouldn't.

        I don't have a repro for this, but suspect it should be possible if the DML stmt doesn't reference all the kudu tbl columns and there are a bunch of nullable and non nullable columns.

        Thomas, are you just going to fix this as part of your partitioning & sorting change?

        Show
        mjacobs Matthew Jacobs added a comment - Nice catch. This seems like this could be serious. void* value = output_expr_ctxs_[j]->GetValue(current_row); if (value == NULL) { if (table_schema.Column(j).is_nullable()) { KUDU_RETURN_IF_ERROR(write->mutable_row()->SetNull(col), "Could not add Kudu WriteOp." ); continue ; } else { // This row violates the nullability constraints of the column, do not attempt // to write this row because it is already known to be an error and the Kudu // error will be difficult to interpret later (error code isn't specific). ++num_null_violations; state->LogError(ErrorMsg::Init(TErrorCode::KUDU_NULL_CONSTRAINT_VIOLATION, table_desc_->table_name())); add_row = false ; break ; // skip remaining columns for this row } } It sounds like one of two wrong things may happen: 1) for some row, this sets a col to NULL that is actually not nullable, but then l238 should return an error. The query would fail. 2) for some row, we count a NULL value as a null constraint violation even though it should be set as NULL, then we do not write the row. l248. #2 seems serious as we'd start dropping data that we shouldn't. I don't have a repro for this, but suspect it should be possible if the DML stmt doesn't reference all the kudu tbl columns and there are a bunch of nullable and non nullable columns. Thomas, are you just going to fix this as part of your partitioning & sorting change?
        Hide
        twmarshall Thomas Tauber-Marshall added a comment -

        I'll submit a fix as a separate patch.

        Show
        twmarshall Thomas Tauber-Marshall added a comment - I'll submit a fix as a separate patch.
        Hide
        twmarshall Thomas Tauber-Marshall added a comment -

        commit baba8960b3cdb167f4eaa300559813c5cea2786d
        Author: Thomas Tauber-Marshall <tmarshall@cloudera.com>
        Date: Tue Apr 18 11:29:51 2017 -0700

        IMPALA-5217: KuduTableSink checks null constraints incorrectly

        KuduTableSink uses the referenced_columns map to translate between the
        index into the output exprs 'j' and the index into columns in the Kudu
        table 'col', but we incorrectly use 'j' when calling into the Kudu table
        schema to check the nullability of columns.

        Testing:

        • Added e2e tests to kudu_insert.test

        Change-Id: I8ed458278f135288a821570939de8ee294183df2
        Reviewed-on: http://gerrit.cloudera.org:8080/6670
        Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        twmarshall Thomas Tauber-Marshall added a comment - commit baba8960b3cdb167f4eaa300559813c5cea2786d Author: Thomas Tauber-Marshall <tmarshall@cloudera.com> Date: Tue Apr 18 11:29:51 2017 -0700 IMPALA-5217 : KuduTableSink checks null constraints incorrectly KuduTableSink uses the referenced_columns map to translate between the index into the output exprs 'j' and the index into columns in the Kudu table 'col', but we incorrectly use 'j' when calling into the Kudu table schema to check the nullability of columns. Testing: Added e2e tests to kudu_insert.test Change-Id: I8ed458278f135288a821570939de8ee294183df2 Reviewed-on: http://gerrit.cloudera.org:8080/6670 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            twmarshall Thomas Tauber-Marshall
            Reporter:
            twmarshall Thomas Tauber-Marshall
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development