Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3704

Fix error message when violating key constraint in Kudu insert

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.6.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Backend
    • Labels:

      Description

      When inserting rows in Kudu using Impala, an unfriendly error message is reported from Impala when a unique key constraint is violated. We need to improve this.

      impala-shell> insert into t1 values (1,1);
      WARNINGS: Error while flushing Kudu session:
      Already present: entry already present in memrowset
      
      
      Error while flushing Kudu session:
      Already present: entry already present in memrowset
      

      Another issue is during insert select statements when some inserted rows violate the unique key constraints:

      impala-shell> create table t1 (a int, b int) ... <--- kudu table
      impala-shell> insert into t1 values(1,1);
      impala-shell> insert into t1 select cast(a + 100 as int), b from t1; <-- works well
      impala-shell> insert into t1 select cast(a + 100 as int), b from t1;
      

      The last statement reports an error, indicating that the insert failed. However, if we run a select on t1 we can see that some rows were successfully inserted. We should improve the error message and always report the number of inserted rows.

        Activity

        Hide
        mjacobs Matthew Jacobs added a comment -

        Inserts into Kudu that result in constraint violations (e.g. duplicate PK) will return an error with a message for every row that had a violation. This becomes unwieldy quickly, and unusable in many cases when there are many violations. This should be handled as well.

        Show
        mjacobs Matthew Jacobs added a comment - Inserts into Kudu that result in constraint violations (e.g. duplicate PK) will return an error with a message for every row that had a violation. This becomes unwieldy quickly, and unusable in many cases when there are many violations. This should be handled as well.
        Hide
        mjacobs Matthew Jacobs added a comment -

        Juan Yu I have most of this patch already.

        Show
        mjacobs Matthew Jacobs added a comment - Juan Yu I have most of this patch already.
        Hide
        jyu@cloudera.com Juan Yu added a comment -

        Sorry for the delay, I was busy with something else. Thanks for taking care of this one.

        Show
        jyu@cloudera.com Juan Yu added a comment - Sorry for the delay, I was busy with something else. Thanks for taking care of this one.
        Hide
        mjacobs Matthew Jacobs added a comment -

        commit 99ed6dc67ae889eb2a45b10c97cb23f52bc83e5d
        Author: Matthew Jacobs <mj@cloudera.com>
        Date: Wed Oct 19 15:30:58 2016 -0700

        IMPALA-4134,IMPALA-3704: Kudu INSERT improvements

        1.) IMPALA-4134: Use Kudu AUTO FLUSH
        Improves performance of writes to Kudu up to 4.2x in
        bulk data loading tests (load 200 million rows from
        lineitem).

        2.) IMPALA-3704: Improve errors on PK conflicts
        The Kudu client reports an error for every PK conflict,
        and all errors were being returned in the error status.
        As a result, inserts/updates/deletes could return errors
        with thousands errors reported. This changes the error
        handling to log all reported errors as warnings and
        return only the first error in the query error status.

        3.) Improve the DataSink reporting of the insert stats.
        The per-partition stats returned by the data sink weren't
        useful for Kudu sinks. Firstly, the number of appended rows
        was not being displayed in the profile. Secondly, the
        'stats' field isn't populated for Kudu tables and thus was
        confusing in the profile, so it is no longer printed if it
        is not set in the thrift struct.

        Testing: Ran local tests, including new tests to verify
        the query profile insert stats. Manual cluster testing was
        conducted of the AUTO FLUSH functionality, and that testing
        informed the default mutation buffer value of 100MB which
        was found to provide good results.

        Change-Id: I5542b9a061b01c543a139e8722560b1365f06595
        Reviewed-on: http://gerrit.cloudera.org:8080/4728
        Reviewed-by: Matthew Jacobs <mj@cloudera.com>
        Tested-by: Internal Jenkins

        Show
        mjacobs Matthew Jacobs added a comment - commit 99ed6dc67ae889eb2a45b10c97cb23f52bc83e5d Author: Matthew Jacobs <mj@cloudera.com> Date: Wed Oct 19 15:30:58 2016 -0700 IMPALA-4134 , IMPALA-3704 : Kudu INSERT improvements 1.) IMPALA-4134 : Use Kudu AUTO FLUSH Improves performance of writes to Kudu up to 4.2x in bulk data loading tests (load 200 million rows from lineitem). 2.) IMPALA-3704 : Improve errors on PK conflicts The Kudu client reports an error for every PK conflict, and all errors were being returned in the error status. As a result, inserts/updates/deletes could return errors with thousands errors reported. This changes the error handling to log all reported errors as warnings and return only the first error in the query error status. 3.) Improve the DataSink reporting of the insert stats. The per-partition stats returned by the data sink weren't useful for Kudu sinks. Firstly, the number of appended rows was not being displayed in the profile. Secondly, the 'stats' field isn't populated for Kudu tables and thus was confusing in the profile, so it is no longer printed if it is not set in the thrift struct. Testing: Ran local tests, including new tests to verify the query profile insert stats. Manual cluster testing was conducted of the AUTO FLUSH functionality, and that testing informed the default mutation buffer value of 100MB which was found to provide good results. Change-Id: I5542b9a061b01c543a139e8722560b1365f06595 Reviewed-on: http://gerrit.cloudera.org:8080/4728 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins

          People

          • Assignee:
            mjacobs Matthew Jacobs
            Reporter:
            dtsirogiannis Dimitris Tsirogiannis
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development