[HUDI-5949] Check the write operation configured by user for better troubleshooting - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Blocker
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: configs
Labels:
- pull-request-available

Description

Background:

We find that Spark-Hudi insert data will return a HoodieException: (Part -) field not found in record. Acceptable fields were :[uuid, name, price]

  ......
	at org.apache.hudi.index.simple.HoodieSimpleIndex.fetchRecordLocationsForAffectedPartitions(HoodieSimpleIndex.java:142)
	at org.apache.hudi.index.simple.HoodieSimpleIndex.tagLocationInternal(HoodieSimpleIndex.java:113)
	at org.apache.hudi.index.simple.HoodieSimpleIndex.tagLocation(HoodieSimpleIndex.java:91)
	at org.apache.hudi.table.action.commit.HoodieWriteHelper.tag(HoodieWriteHelper.java:51)
	at org.apache.hudi.table.action.commit.HoodieWriteHelper.tag(HoodieWriteHelper.java:34)
	at org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:53)
	... 52 more
Caused by: org.apache.hudi.exception.HoodieException: (Part -) field not found in record. Acceptable fields were :[uuid, name, price]
	at org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:530)
	at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$write$11(HoodieSparkSqlWriter.scala:305)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
	at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:194)
	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1509)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20230317222153522
	at org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:64)

Steps to Reproduce:

-- 1. create a table without preCombineKey
CREATE TABLE default.test_hudi_default (
  uuid int,
  name string,
  price double
) USING hudi;

-- 2. config write operation to upsert
set hoodie.datasource.write.operation=upsert;

-- 3. insert data and exception occurs
insert into default.test_hudi_default select 1, 'name1', 1.1;

Root Cause:
Hudi does not support upsert for table without preCombineKey, but this exception message may confuse the users.

Improvement:
We can check the user configured write operation and provide a more specific exception message, it will help user understand what's wrong immediately.

Attachments

Issue Links

links to

GitHub Pull Request #8219

Activity

People

Assignee:: Unassigned

Reporter:: Wechar

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 17/Mar/23 14:38

Updated:: 25/Mar/23 04:48