[SPARK-32478] Error message to show the schema mismatch in gapply with Arrow vectorization - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.1, 3.1.0
Component/s: SparkR
Labels:
None

Description

Currently, the error message is confusing when the output schema type is not matched with the actual R DataFrame in gapply:

./bin/sparkR --conf spark.sql.execution.arrow.sparkr.enabled=true

df <- createDataFrame(list(list(a=1L, b="2")))
count(gapply(df, "a", function(key, group) { group }, structType("a int, b int")))

  org.apache.spark.SparkException: Job aborted due to stage failure: Task 43 in stage 2.0 failed 1 times, most recent failure: Lost task 43.0 in stage 2.0 (TID 2, 192.168.35.193, executor driver): java.lang.UnsupportedOperationException
	at org.apache.spark.sql.vectorized.ArrowColumnVector$ArrowVectorAccessor.getInt(ArrowColumnVector.java:212)
	...

We should probably also document that the type should be matched always.

Attachments

Issue Links

links to

[Github] Pull Request #29283 (HyukjinKwon)

Activity

People

Assignee:: Hyukjin Kwon

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 29/Jul/20 07:39

Updated:: 12/Dec/22 18:11

Resolved:: 30/Jul/20 06:16