Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
streaming compaction
Description
Task is to support compaction of partitions.
Rationale: Streaming partitions are composed of a large number of small files (each commit is one file). Since compaction can be a potentially expensive operation (for e.g. converting to single ORC file), we do not compact the streaming partition at the time of rolling it into a standard partition. This allows rolling to be quick and atomic.
Compaction will be performed at a later time. The streaming partition is converted as is (typically with a many small files) into a standard partition. This new standard partition will be queued up for compaction by a separate job.
This decouples the compaction feature from streaming support, and makes it more generally available for any partitions.