[KUDU-2809] Incremental backup / diff scan does not handle rows that are inserted and deleted between two incrementals correctly - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.9.0
Fix Version/s: 1.10.0
Component/s: backup
Labels:
- backup

Target Version/s:

1.10.0

Description

I did the following sequence of operations:

Insert 100 million rows
Update 1 out of every 11 rows
Make a full backup
Insert 100 million more rows, after the original rows in keyspace
Delete 1 out of every 23 rows
Make an incremental backup

Restore failed to apply the incremental backup, failing with an error like

java.lang.RuntimeException: failed to write 1000 rows from DataFrame to Kudu; sample errors:

Due to another bug, there's no sample errors, but after hacking around that bug, I found that the incremental contained a row with a DELETE action for a key that is not present in the full backup. That's because the row was inserted in step 4 and deleted in step 5, between backups.

We could fix this by

Making diff scan not return a DELETE for such a row
Implementing and using DELETE IGNORE in the restore job

Attachments

Activity

People

Assignee:: Adar Dembo

Reporter:: William Berkeley

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 02/May/19 23:49

Updated:: 08/Jun/19 00:22

Resolved:: 08/Jun/19 00:22