Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
table-store-0.3.0
-
None
Description
Currently table store sinks will write and compact data files from the same job. While this implementation is enough and more economical for most users, some user may expect higher or more steady write throughput.
We decided to support creating separated compact jobs for Table Store. This will bring us the following advantages:
- Write jobs can concentrate only on writing files. Their throughput will be higher and more steady.
- By creating only one compact job for each table, no commit conflicts will occur.
The structure of a separated compact job is sketched out as follows:
- There should be three vertices in a compact job. One source vertex, one sink (compactor) vertex and one commit vertex.
- The source vertex is responsible for generating records containing partitions and buckets to be compacted.
- The sink vertex accepts records containing partitions and buckets, and compact these buckets.
- The commit vertex commit the changes from the sink vertex. It is possible that the user mistakenly creates other compact jobs so commit conflicts may still occur. However as compact changes are optional, this commit vertex will commit changes in an at-most-once style.