Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28441

PythonUDF used in correlated scalar subquery causes UnsupportedOperationException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • PySpark, SQL
    • None

    Description

      I found this when doing https://issues.apache.org/jira/browse/SPARK-28277

       

      >>> @pandas_udf("string", PandasUDFType.SCALAR)
      ... def noop(x):
      ...     return x.apply(str)
      ... 
      >>> spark.udf.register("udf", noop)
      <function noop at 0x111b5f9d8>
      >>> spark.sql("CREATE OR REPLACE TEMPORARY VIEW t1 as select * from values (\"one\", 1), (\"two\", 2),(\"three\", 3),(\"one\", NULL) as t1(k, v)")
      DataFrame[]
      >>> spark.sql("CREATE OR REPLACE TEMPORARY VIEW t2 as select * from values (\"one\", 1), (\"two\", 22),(\"one\", 5),(\"one\", NULL), (NULL, 5) as t2(k, v)")
      DataFrame[]
      >>> spark.sql("SELECT t1.k FROM t1 WHERE  t1.v <= (SELECT   udf(max(udf(t2.v))) FROM     t2 WHERE    udf(t2.k) = udf(t1.k))").show()
      py4j.protocol.Py4JJavaError: An error occurred while calling o65.showString.
      : java.lang.UnsupportedOperationException: Cannot evaluate expression: udf(null)
       at org.apache.spark.sql.catalyst.expressions.Unevaluable.eval(Expression.scala:296)
       at org.apache.spark.sql.catalyst.expressions.Unevaluable.eval$(Expression.scala:295)
       at org.apache.spark.sql.catalyst.expressions.PythonUDF.eval(PythonUDF.scala:52)
      

       

       

       

      Attachments

        Issue Links

          Activity

            People

              viirya L. C. Hsieh
              huaxingao Huaxin Gao
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: