Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28702

Display useful error message (instead of NPE) for invalid Dataset operations (e.g. calling actions inside of transformations)

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      In Spark, SparkContext and SparkSession can only be used on the driver, not on executors. For example, this means that you cannot call someDataset.collect() inside of a Dataset or RDD transformation.

      When Spark serializes RDDs and Datasets, references to SparkContext and SparkSession are null'ed out (by being marked as @transient or via the Closure Cleaner). As a result, RDD and Dataset methods which reference use these driver-side-only objects (e.g. actions or transformations) will see null references and may fail with a NullPointerException. For example, in code which (via a chain of calls) tried to collect() a dataset inside of a Dataset.map operation:

      Caused by: java.lang.NullPointerException
      at <http://org.apache.spark.sql.Dataset.org|org.apache.spark.sql.Dataset.org>$apache$spark$sql$Dataset$$rddQueryExecution$lzycompute(Dataset.scala:3027)
      at <http://org.apache.spark.sql.Dataset.org|org.apache.spark.sql.Dataset.org>$apache$spark$sql$Dataset$$rddQueryExecution(Dataset.scala:3025)
      at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3038)
      at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3036)
      [...] 

      The resulting NPE can be very confusing to users.

      In SPARK-5063 I added some logic to throw clearer error messages when performing similar invalid actions on RDDs. This ticket's scope is to implement similar logic for Datasets.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                shivusondur@gmail.com Shivu Sondur
                Reporter:
                joshrosen Josh Rosen
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: