Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-6404

Create a new clustering strategy to execute parquet tools commands during clustering

    XMLWordPrintableJSON

Details

    Description

      Create a new clustering strategy to execute parquet tools commands during clustering.

      If there is a use case of pruning some columns to save storage memory, current approach of clustering will iterate over every record and remove the unused column, this is so much time consuming. By directly using ParquetTools we can achieve this by running a command within the clustering strategy.

      Here, the logic goes through the process of creating marker files that on event of failures we could use the existing rollback mechanism to remove the inflight files and parquet files.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              suryaprasanna Surya Prasanna Yalla
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: