Details
-
Improvement
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
None
Description
Even with synchronous patch, we instantiate metadata table with single writer mode only.
But we need to support async compaction and cleaning and hence we need to think about supporting multi-writer down the line.
Details:
all writes to metadata table happens within data table lock, including compaction and cleaning in metadata table since we do inline. But as we scale metadata table infra w/ more indexes, we need to support async compaction and cleaning and so we need multi-writer support.
One possibility:
- Special transaction management for metadata table.
data table commits: all writes to metadata table will be guarded by datatable lock (regular writes, clustering, compaction, everything). regular writes will do usual conflict resolution, where as compaction and clustering may not.
Now coming to metadata table commits, there won't be any conflict resolution in general for whole of metadata table. But we will ensure any commit happens by acquiring a lock. Our presumption is that, all the conflict resolution would have happened within data table before proceeding to make a commit in metadata table and so we don't need to do any conflict resolution specifically.
Scheduling of compaction and cleaning will happen along w/ regular upserts. and we will have async compaction and cleaning support. so, when these async operations are looking to commit in metadata table, they will acquire lock, make the commit and release the lock. Only one writer will be in progress during metadata commit.
Attachments
Issue Links
- is related to
-
HUDI-5672 Archived Timeline as LSM Tree - Initial Impl
- Closed
- links to