Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-5949

Check the write operation configured by user for better troubleshooting

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Blocker
    • Resolution: Unresolved
    • None
    • None
    • configs

    Description

       Background:

      We find that Spark-Hudi insert data will return a HoodieException: (Part -) field not found in record. Acceptable fields were :[uuid, name, price]

        ......
      	at org.apache.hudi.index.simple.HoodieSimpleIndex.fetchRecordLocationsForAffectedPartitions(HoodieSimpleIndex.java:142)
      	at org.apache.hudi.index.simple.HoodieSimpleIndex.tagLocationInternal(HoodieSimpleIndex.java:113)
      	at org.apache.hudi.index.simple.HoodieSimpleIndex.tagLocation(HoodieSimpleIndex.java:91)
      	at org.apache.hudi.table.action.commit.HoodieWriteHelper.tag(HoodieWriteHelper.java:51)
      	at org.apache.hudi.table.action.commit.HoodieWriteHelper.tag(HoodieWriteHelper.java:34)
      	at org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:53)
      	... 52 more
      Caused by: org.apache.hudi.exception.HoodieException: (Part -) field not found in record. Acceptable fields were :[uuid, name, price]
      	at org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:530)
      	at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$write$11(HoodieSparkSqlWriter.scala:305)
      	at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
      	at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
      	at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:194)
      	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
      	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
      	at org.apache.spark.scheduler.Task.run(Task.scala:131)
      	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
      	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1509)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20230317222153522
      	at org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:64)
      
      

      Steps to Reproduce:

      -- 1. create a table without preCombineKey
      CREATE TABLE default.test_hudi_default (
        uuid int,
        name string,
        price double
      ) USING hudi;
      
      -- 2. config write operation to upsert
      set hoodie.datasource.write.operation=upsert;
      
      -- 3. insert data and exception occurs
      insert into default.test_hudi_default select 1, 'name1', 1.1;
      

      Root Cause:
      Hudi does not support upsert for table without preCombineKey, but this exception message may confuse the users.

      Improvement:
      We can check the user configured write operation and provide a more specific exception message, it will help user understand what's wrong immediately.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              wechar Wechar
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: