Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-6788

Integrate FileGroupReader with MergeOnReadInputFormat for Flink

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: In Progress
    • Blocker
    • Resolution: Unresolved
    • None
    • 1.1.0
    • None
    • None

    Description

      The existing MergeOnReadInputFormat implements different iterators for all kinds of read more: incremental read, read optimized view, snapshot view etc. While for better performance and code evolving, we can integrate the new FileGroupReader, the main difference is that the FileGroupReader capsulate the file slice logs and parquet merging logic, so each iterator can ease the redundant work for quering the fs view and comprising the file slices.

      We can integrate step by step for different read views: 1. snapshot queries 2. read optimized queries 3. skip merge queries

      For usability and smoth evolving, we should add a flag for the new reader, the old code path should be kept there for 1 or 2 releases.

      The major work AIs includes:

      1. implement the HoodieFlinkRecord akka to the HoodieSparkRecord;
      2. implement the Flink specific FileGroupReader with the HoodieFlinkRecord;

      3. Flink implements the snapshot queries using the file group reader;

      4. Flink implements the read optimized queries using the file group reader;

      5. Flink implements the skip merge queries using the file group reader.

      Attachments

        Activity

          People

            ZhenqiuHuang Zhenqiu Huang
            guoyihua Ethan Guo (this is the old account; please use "yihua")
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: