[HUDI-2279] Support column name matching for insert * and update set * in merge into when sourceTable's columns contains all targetTable's columns - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.9.0
Component/s: spark
Labels:
- pull-request-available

Description

Example:

val tableName = generateTableName
// Create table
spark.sql(
 s"""
 |create table $tableName (
 | id int,
 | name string,
 | price double,
 | ts long,
 | dt string
 |) using hudi
 | location '${tmp.getCanonicalPath}/$tableName'
 | options (
 | primaryKey ='id',
 | preCombineField = 'ts'
 | )
 """.stripMargin)
spark.sql(
  s"""
     |merge into $tableName as t0
     |using (
     |  select 1 as id, '2021-05-05' as dt, 1002 as ts, 97 as price, 'a1' as name union all
     |  select 1 as id, '2021-05-05' as dt, 1003 as ts, 98 as price, 'a2' as name union all
     |  select 2 as id, '2021-05-05' as dt, 1001 as ts, 99 as price, 'a3' as name
     | ) as s0
     |on t0.id = s0.id
     |when matched then update set *
     |when not matched  then insert *
     |""".stripMargin)
spark.sql(s"select id, name, price, ts, dt from $tableName").show()

Fow now,the result is:

----------------

name

price

----------------

2	2021-05-05	99.0	99	a3
1	2021-05-05	98.0	98	a2

----------------

When the order of the column types of souceTable is different from that of the column types of targetTable

spark.sql(
  s"""
     |merge into ${tableName} as t0
     |using (
     |  select 1 as id, 'a1' as name, 1002 as ts, '2021-05-05' as dt, 97 as price union all
     |  select 1 as id, 'a2' as name, 1003 as ts, '2021-05-05' as dt, 98 as price union all
     |  select 2 as id, 'a3' as name, 1001 as ts, '2021-05-05' as dt, 99 as price
     | ) as s0
     |on t0.id = s0.id
     |when matched then update set *
     |when not matched  then insert *
     |""".stripMargin)

It will throw an exception:

[ERROR] 2021-08-05 21:48:53,941 org.apache.hudi.io.HoodieWriteHandle  - Error writing record HoodieRecord{key=HoodieKey { recordKey=id:2 partitionPath=}, currentLocation='null', newLocation='null'}
java.lang.RuntimeException: Error in execute expression: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer.
Expressions is: [boundreference() AS `id`  boundreference() AS `name`  CAST(boundreference() AS `price` AS DOUBLE)  CAST(boundreference() AS `ts` AS BIGINT)  CAST(boundreference() AS `dt` AS STRING)]
CodeBody is: {
......
Caused by: java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.IntegerCaused by: java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer at org.apache.hudi.sql.payload.ExpressionPayloadEvaluator_366797ae_4c30_4862_8222_7be486ede4f8.eval(Unknown Source) at org.apache.spark.sql.hudi.command.payload.ExpressionPayload.org$apache$spark$sql$hudi$command$payload$ExpressionPayload$$evaluate(ExpressionPayload.scala:258) ... 18 more

Attachments

Issue Links

links to

GitHub Pull Request #3415

Activity

People

Assignee:: 董可伦

Reporter:: 董可伦

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 05/Aug/21 13:31

Updated:: 22/Aug/21 09:10

Resolved:: 15/Aug/21 02:57