Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Create a new clustering strategy to execute parquet tools commands during clustering.
If there is a use case of pruning some columns to save storage memory, current approach of clustering will iterate over every record and remove the unused column, this is so much time consuming. By directly using ParquetTools we can achieve this by running a command within the clustering strategy.
Here, the logic goes through the process of creating marker files that on event of failures we could use the existing rollback mechanism to remove the inflight files and parquet files.
Attachments
Issue Links
- links to