Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-2414

enable Hot and cold data separate when ingest data

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Writer Core
    • Labels:
      None

      Description

      when using Hudi to ingest e-commercial company's item data,there are massive update data into old partitions,if one record need update, then the whole file it belongs need rewrite, that result in every commit nearly rewrite the whole table.

      I'm thinking if Hudi can provide a hot and cold data separate tool, work with specific column(such as create time and update time) to distinguish hot data and cold data, then rebuild table to separate them into different file groups, after recreate table, the performance will be much better 

        Attachments

          Activity

            People

            • Assignee:
              fengjian_428 Jian Feng
              Reporter:
              fengjian_428 Jian Feng

              Dates

              • Created:
                Updated:

                Issue deployment