Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31553

Wrong result of isInCollection for large collections

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0, 3.1.0
    • Fix Version/s: 3.0.0
    • Component/s: SQL
    • Labels:

      Description

      If the size of a collection passed to isInCollection is bigger than spark.sql.optimizer.inSetConversionThreshold, the method can return wrong results for some inputs. For example:

          val set = (0 to 20).map(_.toString).toSet
          val data = Seq("1").toDF("x")
          println(set.contains("1"))
          data.select($"x".isInCollection(set).as("isInCollection")).show()
      
      true
      +--------------+
      |isInCollection|
      +--------------+
      |         false|
      +--------------+
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                maxgekk Maxim Gekk
                Reporter:
                maxgekk Maxim Gekk
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: