Description
>>> spark.conf.set("spark.sql.crossJoin.enabled", "false") >>> spark.range(1).join(spark.range(1), how="inner").show() Traceback (most recent call last): ... py4j.protocol.Py4JJavaError: An error occurred while calling o66.join. : java.lang.NullPointerException at org.apache.spark.sql.Dataset.join(Dataset.scala:931) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ... >>> spark.conf.set("spark.sql.crossJoin.enabled", "true") >>> spark.range(1).join(spark.range(1), how="inner").show() ... py4j.protocol.Py4JJavaError: An error occurred while calling o84.join. : java.lang.NullPointerException at org.apache.spark.sql.Dataset.join(Dataset.scala:931) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ...
Omitting columns as above throws an exception.
This works in 2.0.2:
Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.0.2 /_/ Using Python version 2.7.10 (default, Jul 30 2016 19:40:32) SparkSession available as 'spark'. >>> spark.range(1).join(spark.range(1), how="inner").show() +---+---+ | id| id| +---+---+ | 0| 0| +---+---+
but looks not from Spark 2.1.0.
It sounds a trivial small regression:
Attachments
Issue Links
- relates to
-
SPARK-14761 PySpark DataFrame.join should reject invalid join methods even when join columns are not specified
- Resolved
- links to