[SPARK-12624] When schema is specified, we should give better error message if actual row length doesn't match - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.6.1, 2.0.0
Component/s: PySpark, SQL
Labels:
None

Target Version/s:

1.6.1, 2.0.0

Description

The following code snippet reproduces this issue:

from pyspark.sql.types import StructType, StructField, IntegerType, StringType
from pyspark.sql.types import Row

schema = StructType([StructField("a", IntegerType()), StructField("b", StringType())])
rdd = sc.parallelize(range(10)).map(lambda x: Row(a=x))
df = sqlContext.createDataFrame(rdd, schema)
df.show()

An unintuitive ArrayIndexOutOfBoundsException exception is thrown in this case:

...
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
        at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.genericGet(rows.scala:227)
        at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getAs(rows.scala:35)
        at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.isNullAt(rows.scala:36)
...

We should give a better error message here.

Attachments

Issue Links

relates to

SPARK-13748 Document behavior of createDataFrame and rows with omitted fields

Resolved

links to

[Github] Pull Request #10886 (liancheng)

Activity

People

Assignee:: Cheng Lian

Reporter:: Reynold Xin

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 04/Jan/16 19:50

Updated:: 19/Jun/16 07:23

Resolved:: 25/Jan/16 03:41