Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31563

Failure of InSet.sql for UTF8String collection

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.5, 3.0.0, 3.1.0
    • Fix Version/s: 2.4.6, 3.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      The InSet expression works on collections of internal Catalyst's types. We can see this in the optimization when In is replaced by InSet, and In's collection is evaluated to internal Catalyst's values: https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L253-L254

              if (newList.length > SQLConf.get.optimizerInSetConversionThreshold) {
                val hSet = newList.map(e => e.eval(EmptyRow))
                InSet(v, HashSet() ++ hSet)
              }
      

      The code existed before the optimization https://github.com/apache/spark/pull/25754 that made another wrong assumption about collection types.

      If InSet accepts only internal Catalyst's types, the following code shouldn't fail:

      InSet(Literal("a"), Set("a", "b").map(UTF8String.fromString)).sql
      

      but it fails with the exception:

      Unsupported literal type class org.apache.spark.unsafe.types.UTF8String a
      java.lang.RuntimeException: Unsupported literal type class org.apache.spark.unsafe.types.UTF8String a
      	at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:88)
      	at org.apache.spark.sql.catalyst.expressions.InSet.$anonfun$sql$2(predicates.scala:522)
      

       

        Attachments

          Activity

            People

            • Assignee:
              maxgekk Max Gekk
              Reporter:
              maxgekk Max Gekk
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: