Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2809

Incremental backup / diff scan does not handle rows that are inserted and deleted between two incrementals correctly



    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.9.0
    • Fix Version/s: 1.10.0
    • Component/s: backup
    • Labels:


      I did the following sequence of operations:

      1. Insert 100 million rows
      2. Update 1 out of every 11 rows
      3. Make a full backup
      4. Insert 100 million more rows, after the original rows in keyspace
      5. Delete 1 out of every 23 rows
      6. Make an incremental backup

      Restore failed to apply the incremental backup, failing with an error like

      java.lang.RuntimeException: failed to write 1000 rows from DataFrame to Kudu; sample errors:

      Due to another bug, there's no sample errors, but after hacking around that bug, I found that the incremental contained a row with a DELETE action for a key that is not present in the full backup. That's because the row was inserted in step 4 and deleted in step 5, between backups.

      We could fix this by

      1. Making diff scan not return a DELETE for such a row
      2. Implementing and using DELETE IGNORE in the restore job




            • Assignee:
              adar Adar Dembo
              wdberkeley William Berkeley
            • Votes:
              0 Vote for this issue
              5 Start watching this issue


              • Created: