[SPARK-26366] Except with transform regression - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0, 2.3.1, 2.3.2
Fix Version/s: 2.3.3, 2.4.1, 3.0.0
Component/s: Spark Core, SQL
Labels:
- correctness

Description

There appears to be a regression between Spark 2.2 and 2.3. Below is the code to reproduce it:

import org.apache.spark.sql.functions.col
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._


val inputDF = spark.sqlContext.createDataFrame(
  spark.sparkContext.parallelize(Seq(
    Row("0", "john", "smith", "john@smith.com"),
    Row("1", "jane", "doe", "jane@doe.com"),
    Row("2", "apache", "spark", "spark@apache.org"),
    Row("3", "foo", "bar", null)
  )),
  StructType(List(
    StructField("id", StringType, nullable=true),
    StructField("first_name", StringType, nullable=true),
    StructField("last_name", StringType, nullable=true),
    StructField("email", StringType, nullable=true)
  ))
)

val exceptDF = inputDF.transform( toProcessDF =>
  toProcessDF.filter(
      (
        col("first_name").isin(Seq("john", "jane"): _*)
          and col("last_name").isin(Seq("smith", "doe"): _*)
      )
      or col("email").isin(List(): _*)
  )
)

inputDF.except(exceptDF).show()

Output with Spark 2.2:

+---+----------+---------+----------------+
| id|first_name|last_name| email|
+---+----------+---------+----------------+
| 2| apache| spark|spark@apache.org|
| 3| foo| bar| null|
+---+----------+---------+----------------+

Output with Spark 2.3:

+---+----------+---------+----------------+
| id|first_name|last_name| email|
+---+----------+---------+----------------+
| 2| apache| spark|spark@apache.org|
+---+----------+---------+----------------+

Note, changing the last line to

inputDF.except(exceptDF.cache()).show()

produces identical output for both Spark 2.3 and 2.2

Attachments

Issue Links

links to

GitHub Pull Request #23315

GitHub Pull Request #23350

GitHub Pull Request #23372

Activity

People

Assignee:: Marco Gaido

Reporter:: Dan Osipov

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 13/Dec/18 16:20

Updated:: 02/Mar/20 09:00

Resolved:: 19/Dec/18 07:26