Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-2279

Support column name matching for insert * and update set * in merge into when sourceTable's columns contains all targetTable's columns

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.9.0
    • spark

    Description

      Example:

      val tableName = generateTableName
      // Create table
      spark.sql(
       s"""
       |create table $tableName (
       | id int,
       | name string,
       | price double,
       | ts long,
       | dt string
       |) using hudi
       | location '${tmp.getCanonicalPath}/$tableName'
       | options (
       | primaryKey ='id',
       | preCombineField = 'ts'
       | )
       """.stripMargin)
      spark.sql(
        s"""
           |merge into $tableName as t0
           |using (
           |  select 1 as id, '2021-05-05' as dt, 1002 as ts, 97 as price, 'a1' as name union all
           |  select 1 as id, '2021-05-05' as dt, 1003 as ts, 98 as price, 'a2' as name union all
           |  select 2 as id, '2021-05-05' as dt, 1001 as ts, 99 as price, 'a3' as name
           | ) as s0
           |on t0.id = s0.id
           |when matched then update set *
           |when not matched  then insert *
           |""".stripMargin)
      spark.sql(s"select id, name, price, ts, dt from $tableName").show()

      Fow now,the result is:

      ----------------

      id name price ts dt

      ----------------

      2 2021-05-05 99.0 99 a3
      1 2021-05-05 98.0 98 a2

      ----------------

      When the order of the column types of souceTable is different from that of the column types of targetTable

       

      spark.sql(
        s"""
           |merge into ${tableName} as t0
           |using (
           |  select 1 as id, 'a1' as name, 1002 as ts, '2021-05-05' as dt, 97 as price union all
           |  select 1 as id, 'a2' as name, 1003 as ts, '2021-05-05' as dt, 98 as price union all
           |  select 2 as id, 'a3' as name, 1001 as ts, '2021-05-05' as dt, 99 as price
           | ) as s0
           |on t0.id = s0.id
           |when matched then update set *
           |when not matched  then insert *
           |""".stripMargin)

       

      It will throw an exception:

      [ERROR] 2021-08-05 21:48:53,941 org.apache.hudi.io.HoodieWriteHandle  - Error writing record HoodieRecord{key=HoodieKey { recordKey=id:2 partitionPath=}, currentLocation='null', newLocation='null'}
      java.lang.RuntimeException: Error in execute expression: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer.
      Expressions is: [boundreference() AS `id`  boundreference() AS `name`  CAST(boundreference() AS `price` AS DOUBLE)  CAST(boundreference() AS `ts` AS BIGINT)  CAST(boundreference() AS `dt` AS STRING)]
      CodeBody is: {
      ......
      Caused by: java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.IntegerCaused by: java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer at org.apache.hudi.sql.payload.ExpressionPayloadEvaluator_366797ae_4c30_4862_8222_7be486ede4f8.eval(Unknown Source) at org.apache.spark.sql.hudi.command.payload.ExpressionPayload.org$apache$spark$sql$hudi$command$payload$ExpressionPayload$$evaluate(ExpressionPayload.scala:258) ... 18 more

       

      Attachments

        Issue Links

          Activity

            People

              dongkelun 董可伦
              dongkelun 董可伦
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: