Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28702

Display useful error message (instead of NPE) for invalid Dataset operations (e.g. calling actions inside of transformations)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • SQL
    • None

    Description

      In Spark, SparkContext and SparkSession can only be used on the driver, not on executors. For example, this means that you cannot call someDataset.collect() inside of a Dataset or RDD transformation.

      When Spark serializes RDDs and Datasets, references to SparkContext and SparkSession are null'ed out (by being marked as @transient or via the Closure Cleaner). As a result, RDD and Dataset methods which reference use these driver-side-only objects (e.g. actions or transformations) will see null references and may fail with a NullPointerException. For example, in code which (via a chain of calls) tried to collect() a dataset inside of a Dataset.map operation:

      Caused by: java.lang.NullPointerException
      at <http://org.apache.spark.sql.Dataset.org|org.apache.spark.sql.Dataset.org>$apache$spark$sql$Dataset$$rddQueryExecution$lzycompute(Dataset.scala:3027)
      at <http://org.apache.spark.sql.Dataset.org|org.apache.spark.sql.Dataset.org>$apache$spark$sql$Dataset$$rddQueryExecution(Dataset.scala:3025)
      at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3038)
      at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3036)
      [...] 

      The resulting NPE can be very confusing to users.

      In SPARK-5063 I added some logic to throw clearer error messages when performing similar invalid actions on RDDs. This ticket's scope is to implement similar logic for Datasets.

      Attachments

        Issue Links

          Activity

            People

              shivusondur@gmail.com Shivu Sondur
              joshrosen Josh Rosen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: