[HUDI-7503] Concurrent executions of table service plan should not corrupt dataset - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Patch Available
Priority: Minor
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: 0.16.0, 1.1.0
Component/s: compaction, table-service
Labels:
- pull-request-available

Description

Some external workflow schedulers can accidentally (or) misbehave and schedule duplicate executions of the same compaction plan. We need a way to guard against this inside Hudi (vs user taking a lock externally). In such a world, 2 instance of the job concurrently call `org.apache.hudi.client.BaseHoodieTableServiceClient#compact` on the same compaction instant.

This is since one writer might execute the instant and create an inflight, while the other writer sees the inflight and tries to roll it back before re-attempting to execute it (since it will assume said inflight was a previously failed compaction attempt).

This logic should be updated such that only one writer will actually execute the compaction plan at a time (and the others will fail/abort).

One approach is to use a transaction (base table lock) in conjunction with heartbeating, to ensure that the writer triggers a heartbeat before executing compaction, and any concurrent writers will use the heartbeat to check wether the compaction is currently being executed by another writer. Specifically , the compact API should execute the following steps

Get the instant to compact C (as usual)
Start a transaction
Checks if C has an active heartbeat, if so finish transaction and throw exception
Start a heartbeat for C (this will implicitly re-start the heartbeat if it has been started before by another job)
Finish transaction
Run the existing compact API logic on C
If execution succeeds, clean up heartbeat file . If it fails do nothing (as the heartbeat will anyway be automatically expired later).

Note that this approach only holds the table lock temporarily, when checking/starting the heartbeat

Also, this flow can be applied to execution of clean plans and other table services

Attachments

Issue Links

links to

GitHub Pull Request #10965

Activity

People

Assignee:: sivabalan narayanan

Reporter:: Krishen Bhan

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 14/Mar/24 00:53

Updated:: 12/Sep/24 02:12