[HUDI-3981] Flink engine support for comprehensive schema evolution - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.13.0
Component/s: flink
Labels:
- pull-request-available

Epic Link:
Comprehensive Schema evolution in Hudi

Description

Context

Currently, there is no support of schema evolution presented in RFC-33 in flink engine.
Example 1. Assume spark changes type of column:

set hoodie.schema.on.read.enable=true
create table t1 (id int, val int, par string) ... partitioned by (par)
insert into t1 values (1, 10, 'p1')
alter table t1 alter column val type string
insert into t1 values (2, 'val20', 'p2')

When flink tries to read t1:

create table t1 (id int, val string, par string) partitioned by (par) with (...)
select * from t1

the error occurs:

java.lang.IllegalArgumentException: Unexpected type: INT32

This is just an example, errors may differ in the case of cow/mor/snapshot/incremental/batch/streaming/rename column/add column.

Also it is not yet possible to write data when schema is changed.
Example 2. Case below leads to errors

flink: write data
flink: stop job
spark: modify schema according to RFC-33
flink: new job with modified schema
flink: write data

Proposal

Provide full support in flink engine when schema is modified according to RFC-33
add column, rename column, change type of column, drop column when:

batch/streaming
mor (snapshot/incremental/optimized) read/write
cow (snapshot/incremental) read/write
mor compaction

Attachments

Issue Links

links to

GitHub Pull Request #5830

GitHub Pull Request #7321

Sub-Tasks

There are no Sub-Tasks for this issue.

Activity

People

Assignee:: Alexander Trushev

Reporter:: Alexander Trushev

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 27/Apr/22 09:04

Updated:: 17/Feb/23 23:29

Resolved:: 17/Feb/23 23:29