Details
Description
The following code snippet reproduces this issue:
from pyspark.sql.types import StructType, StructField, IntegerType, StringType from pyspark.sql.types import Row schema = StructType([StructField("a", IntegerType()), StructField("b", StringType())]) rdd = sc.parallelize(range(10)).map(lambda x: Row(a=x)) df = sqlContext.createDataFrame(rdd, schema) df.show()
An unintuitive ArrayIndexOutOfBoundsException exception is thrown in this case:
... Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.genericGet(rows.scala:227) at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getAs(rows.scala:35) at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.isNullAt(rows.scala:36) ...
We should give a better error message here.
Attachments
Issue Links
- relates to
-
SPARK-13748 Document behavior of createDataFrame and rows with omitted fields
- Resolved
- links to