Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0, 3.1.0
Description
If the size of a collection passed to isInCollection is bigger than spark.sql.optimizer.inSetConversionThreshold, the method can return wrong results for some inputs. For example:
val set = (0 to 20).map(_.toString).toSet val data = Seq("1").toDF("x") println(set.contains("1")) data.select($"x".isInCollection(set).as("isInCollection")).show()
true +--------------+ |isInCollection| +--------------+ | false| +--------------+
Attachments
Issue Links
- is caused by
-
SPARK-12593 Convert basic resolved logical plans back to SQL query strings
- Resolved
-
SPARK-29048 Query optimizer slow when using Column.isInCollection() with a large size collection
- In Progress
- links to
(1 links to)