Description
I did the following sequence of operations:
- Insert 100 million rows
- Update 1 out of every 11 rows
- Make a full backup
- Insert 100 million more rows, after the original rows in keyspace
- Delete 1 out of every 23 rows
- Make an incremental backup
Restore failed to apply the incremental backup, failing with an error like
java.lang.RuntimeException: failed to write 1000 rows from DataFrame to Kudu; sample errors:
Due to another bug, there's no sample errors, but after hacking around that bug, I found that the incremental contained a row with a DELETE action for a key that is not present in the full backup. That's because the row was inserted in step 4 and deleted in step 5, between backups.
We could fix this by
- Making diff scan not return a DELETE for such a row
- Implementing and using DELETE IGNORE in the restore job