[SPARK-49686] Spark get stuck while evaluating sorted wrong dataframe rdd - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 3.5.0, 3.5.1, 3.5.2
Fix Version/s: None
Component/s: Spark Core
Labels:
None

Description

Under the below scenario `sorted.rdd` takes for ever instead of throwing the expected exception because the schema does not fit the data types.

import org.apache.spark.sql.Row
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.types._
import scala.util.Try

val data: Seq[Row] = Seq(
  Row(1, "a"),
  Row(2, "b"),
  Row(3, "c")
)

val schema = StructType(Seq(
  StructField("id", StringType),
  StructField("value", StringType)
))

val df = spark.createDataFrame(
  spark.sparkContext.parallelize(data), schema
)

val sorted = df.orderBy(col("value"))

Try(sorted.rdd)
sorted.rdd

A less simplified error is happening to us when using Holden Karau Spark Testing Base and as workaround we are forcing an action before the assert, but I guess it is not the expected behaviour.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Alvaro Berdonces

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 17/Sep/24 13:38

Updated:: 17/Sep/24 14:38