Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-49686

Spark get stuck while evaluating sorted wrong dataframe rdd

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.5.0, 3.5.1, 3.5.2
    • None
    • Spark Core
    • None

    Description

      Under the below scenario `sorted.rdd` takes for ever instead of throwing the expected exception because the schema does not fit the data types.

       

      import org.apache.spark.sql.Row
      import org.apache.spark.sql.functions.col
      import org.apache.spark.sql.types._
      import scala.util.Try
      
      val data: Seq[Row] = Seq(
        Row(1, "a"),
        Row(2, "b"),
        Row(3, "c")
      )
      
      val schema = StructType(Seq(
        StructField("id", StringType),
        StructField("value", StringType)
      ))
      
      val df = spark.createDataFrame(
        spark.sparkContext.parallelize(data), schema
      )
      
      val sorted = df.orderBy(col("value"))
      
      Try(sorted.rdd)
      sorted.rdd
      

       

       

      A less simplified error is happening to us when using Holden Karau Spark Testing Base and as workaround we are forcing an action before the assert, but I guess it is not the expected behaviour.

      Attachments

        Activity

          People

            Unassigned Unassigned
            berdonces Alvaro Berdonces
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: