Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
2.3.1, 2.4.0
Description
For example, if array is used (where the lower and upper bounds for its column batch are null)), it looks wrongly filtering all data out:
scala> import org.apache.spark.sql.functions import org.apache.spark.sql.functions scala> val df = Seq(Array("a", "b"), Array("c", "d")).toDF("arrayCol") df: org.apache.spark.sql.DataFrame = [arrayCol: array<string>] scala> df.filter(df.col("arrayCol").eqNullSafe(functions.array(functions.lit("a"), functions.lit("b")))).show() +--------+ |arrayCol| +--------+ | [a, b]| +--------+ scala> df.cache().filter(df.col("arrayCol").eqNullSafe(functions.array(functions.lit("a"), functions.lit("b")))).show() +--------+ |arrayCol| +--------+ +--------+