[SPARK-28702] Display useful error message (instead of NPE) for invalid Dataset operations (e.g. calling actions inside of transformations) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.0
Component/s: SQL
Labels:
None

Description

In Spark, SparkContext and SparkSession can only be used on the driver, not on executors. For example, this means that you cannot call someDataset.collect() inside of a Dataset or RDD transformation.

When Spark serializes RDDs and Datasets, references to SparkContext and SparkSession are null'ed out (by being marked as @transient or via the Closure Cleaner). As a result, RDD and Dataset methods which reference use these driver-side-only objects (e.g. actions or transformations) will see null references and may fail with a NullPointerException. For example, in code which (via a chain of calls) tried to collect() a dataset inside of a Dataset.map operation:

Caused by: java.lang.NullPointerException
at <http://org.apache.spark.sql.Dataset.org|org.apache.spark.sql.Dataset.org>$apache$spark$sql$Dataset$$rddQueryExecution$lzycompute(Dataset.scala:3027)
at <http://org.apache.spark.sql.Dataset.org|org.apache.spark.sql.Dataset.org>$apache$spark$sql$Dataset$$rddQueryExecution(Dataset.scala:3025)
at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3038)
at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3036)
[...]

The resulting NPE can be very confusing to users.

In ~~SPARK-5063~~ I added some logic to throw clearer error messages when performing similar invalid actions on RDDs. This ticket's scope is to implement similar logic for Datasets.

Attachments

Issue Links

relates to

SPARK-5063 Display more helpful error messages for several invalid operations

Resolved

links to

GitHub Pull Request #25503

Activity

People

Assignee:: Shivu Sondur

Reporter:: Josh Rosen

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 12/Aug/19 23:45

Updated:: 23/Aug/19 05:16

Resolved:: 23/Aug/19 05:16