Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2809

Incremental backup / diff scan does not handle rows that are inserted and deleted between two incrementals correctly

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.9.0
    • Fix Version/s: 1.10.0
    • Component/s: backup
    • Labels:

      Description

      I did the following sequence of operations:

      1. Insert 100 million rows
      2. Update 1 out of every 11 rows
      3. Make a full backup
      4. Insert 100 million more rows, after the original rows in keyspace
      5. Delete 1 out of every 23 rows
      6. Make an incremental backup

      Restore failed to apply the incremental backup, failing with an error like

      java.lang.RuntimeException: failed to write 1000 rows from DataFrame to Kudu; sample errors:
      

      Due to another bug, there's no sample errors, but after hacking around that bug, I found that the incremental contained a row with a DELETE action for a key that is not present in the full backup. That's because the row was inserted in step 4 and deleted in step 5, between backups.

      We could fix this by

      1. Making diff scan not return a DELETE for such a row
      2. Implementing and using DELETE IGNORE in the restore job

        Attachments

          Activity

            People

            • Assignee:
              adar Adar Dembo
              Reporter:
              wdberkeley William Berkeley
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: