Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.5.0, 3.5.1, 3.5.2
-
None
-
None
Description
Under the below scenario `sorted.rdd` takes for ever instead of throwing the expected exception because the schema does not fit the data types.
import org.apache.spark.sql.Row import org.apache.spark.sql.functions.col import org.apache.spark.sql.types._ import scala.util.Try val data: Seq[Row] = Seq( Row(1, "a"), Row(2, "b"), Row(3, "c") ) val schema = StructType(Seq( StructField("id", StringType), StructField("value", StringType) )) val df = spark.createDataFrame( spark.sparkContext.parallelize(data), schema ) val sorted = df.orderBy(col("value")) Try(sorted.rdd) sorted.rdd
A less simplified error is happening to us when using Holden Karau Spark Testing Base and as workaround we are forcing an action before the assert, but I guess it is not the expected behaviour.