Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Duplicate
-
None
-
None
-
4
Description
The append handle now always write data block first then delete block, and the delete block only keeps the hoodie keys, when reading, the scanner just read the DELETE block without any info of ordering value, thus, if the we write two records:
insert:
{id: 0, ts: 2}delete:
{id: 0, ts: 1}Finally the insert message is deleted !!!, this is a critical bug for streaming write, we should fix it as soon as possible
Here is the discussion on slack:
Danny Chan 12:42 PM
https://issues.apache.org/jira/browse/HUDI-2299
12:43
Hi, @vc, our user found a critical bug for MOR log format, if there are disorder DELETEs in the streaming messages, the event time of the DELETEs are totally ignored.
12:44
I guess this should be a blocker of 0.9 because it affect the correctness of the data set.
vc 12:44 PM
if we can fix it by end of day friday PST
12:44
we can add it
12:44
Just want to cut a release this week.
12:45
Do you have a sense for the fix? bandwidth to take it up?
Danny Chan 12:46 PM
I try to fix it but can not figure out a good way, if the DELETE block records the orderingVal, the format breaks the compatibility.
vc 1:05 PM
We can version the format. thats doable. Should we precombine before even logging the deeltes?
Danny Chan 1:11 PM
Yes, we should
vc 1:26 PM
I think, thats how its working today. Deletes don't have an ordering val per se, right
1:28
Delete block at t1 :
delete key k
Data block at t2 :
ins key k with ordering val 2
We can just fix it so that the insert shows up, since t2 > t1.
For what kind of functionality you need, we need to do soft deletes i.e updates with an ordering value instead of hard deletes
1:28
makes sense?
Danny Chan 1:32 PM
we can but that’s not the perfect solution, especially if the dataset comes from a CDC source, for example the MySQL binlog. There is no extra flag in schema for soft delete though.
1:37
In my opinion, it is not about soft DELETE or hard DELETE, even if we do a soft DELETE, the event time (orderingVal) is still important for consumers for versoning. (edited)
vc 1:57 PM
tbh, I don't see us fixing this in two days
1:58
lets do a 0.9.1 after this ?
1:58
shortly after with a bunch of bug fixes and the large pending PRs
1:58
we can even make it 0.10.0
Danny Chan 1:58 PM
Yes, the cut time is very soon. We can move the fix to next version.
vc 1:59 PM
We have some inconsistent semantics in places
1:59
some are commit time (arrival time) based and some are orderingVal (event time) based
2:00
In the meantime, see HoodieDeleteBlockVersion you can just define a new version for delete block alone for e,g
2:00
and add more information
Attachments
Issue Links
- duplicates
-
HUDI-2752 The MOR DELETE block breaks the event time sequence of CDC
- Closed