Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-5986

empty preCombineKey should never be stored in hoodie.properties

    XMLWordPrintableJSON

Details

    Description

      Overview:
      We found hoodie.properties will keep the empty preCombineKey if the table does not have preCombineKey. And the empty preCombineKey will cause the exception when insert data:

      Caused by: org.apache.hudi.exception.HoodieException: (Part -) field not found in record. Acceptable fields were :[id, name, price]
      	at org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:557)
      	at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$createHoodieRecordRdd$1$$anonfun$apply$5.apply(HoodieSparkSqlWriter.scala:1134)
      	at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$createHoodieRecordRdd$1$$anonfun$apply$5.apply(HoodieSparkSqlWriter.scala:1127)
      	at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
      	at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
      	at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:193)
      	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
      	at org.apache.spark.scheduler.Task.run(Task.scala:123)
      	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
      	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      

      Steps to Reproduce:

      -- 1. create a table without preCombineKey
      CREATE TABLE default.test_hudi_default_cm (
        uuid int,
        name string,
        price double
      ) USING hudi
      options (
       primaryKey='uuid');
      
      -- 2. config write operation to insert
      set hoodie.datasource.write.operation=insert;
      set hoodie.merge.allow.duplicate.on.inserts=true;
      
      -- 3. insert data
      insert into default.test_hudi_default_cm select 1, 'name1', 1.1;
      
      -- 4. insert overwrite
      insert overwrite table default.test_hudi_default_cm select 2, 'name3', 1.1;
      
      -- 5. insert data will occur exception
      insert into default.test_hudi_default_cm select 1, 'name3', 1.1;
      

      Root Cause:
      Hudi re-construct the table when insert overwrite table in sql but the configured operation is not, then it stores the default empty preCombineKey in hoodie.properties.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              wechar Wechar
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: