[SPARK-21264] Omitting columns with 'how' specified in join in PySpark throws NPE - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.1.0, 2.2.0
Fix Version/s: 2.3.0
Component/s: PySpark
Labels:
None

Description

>>> spark.conf.set("spark.sql.crossJoin.enabled", "false")
>>> spark.range(1).join(spark.range(1), how="inner").show()
Traceback (most recent call last):
...
py4j.protocol.Py4JJavaError: An error occurred while calling o66.join.
: java.lang.NullPointerException
	at org.apache.spark.sql.Dataset.join(Dataset.scala:931)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...

>>> spark.conf.set("spark.sql.crossJoin.enabled", "true")
>>> spark.range(1).join(spark.range(1), how="inner").show()
...
py4j.protocol.Py4JJavaError: An error occurred while calling o84.join.
: java.lang.NullPointerException
	at org.apache.spark.sql.Dataset.join(Dataset.scala:931)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...

Omitting columns as above throws an exception.

This works in 2.0.2:

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.0.2
      /_/

Using Python version 2.7.10 (default, Jul 30 2016 19:40:32)
SparkSession available as 'spark'.
>>> spark.range(1).join(spark.range(1), how="inner").show()
+---+---+
| id| id|
+---+---+
|  0|  0|
+---+---+

but looks not from Spark 2.1.0.

It sounds a trivial small regression:

Attachments

Issue Links

relates to

SPARK-14761 PySpark DataFrame.join should reject invalid join methods even when join columns are not specified

Resolved

links to

[Github] Pull Request #18484 (HyukjinKwon)

Activity

People

Assignee:: Hyukjin Kwon

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/Jun/17 08:06

Updated:: 12/Dec/22 17:51

Resolved:: 04/Jul/17 02:35