Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-3981

Flink engine support for comprehensive schema evolution

    XMLWordPrintableJSON

Details

    Description

      Context

      Currently, there is no support of schema evolution presented in RFC-33 in flink engine.
      Example 1. Assume spark changes type of column:

      set hoodie.schema.on.read.enable=true
      create table t1 (id int, val int, par string) ... partitioned by (par)
      insert into t1 values (1, 10, 'p1')
      alter table t1 alter column val type string
      insert into t1 values (2, 'val20', 'p2')
      

      When flink tries to read t1:

      create table t1 (id int, val string, par string) partitioned by (par) with (...)
      select * from t1
      

      the error occurs:

      java.lang.IllegalArgumentException: Unexpected type: INT32
      

      This is just an example, errors may differ in the case of cow/mor/snapshot/incremental/batch/streaming/rename column/add column.

      Also it is not yet possible to write data when schema is changed.
      Example 2. Case below leads to errors

      flink: write data
      flink: stop job
      spark: modify schema according to RFC-33
      flink: new job with modified schema
      flink: write data
      

      Proposal

      Provide full support in flink engine when schema is modified according to RFC-33
      add column, rename column, change type of column, drop column when:

      1. batch/streaming
      2. mor (snapshot/incremental/optimized) read/write
      3. cow (snapshot/incremental) read/write
      4. mor compaction

      Attachments

        There are no Sub-Tasks for this issue.

        Activity

          People

            trushev Alexander Trushev
            trushev Alexander Trushev
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: