Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.1
    • Fix Version/s: 2.4.0
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      In the current impl, we have the limitation: users are unable to mix vectorized and non-vectorized UDFs in same Project. This becomes worse since our optimizer could combine continuous Projects into a single one. For example, 

      
      applied_df = df.withColumn('regular', my_regular_udf('total', 'qty')).withColumn('pandas', my_pandas_udf('total', 'qty'))
      
      

      Returns the following error. 

      
      IllegalArgumentException: Can not mix vectorized and non-vectorized UDFs
      
      java.lang.IllegalArgumentException: Can not mix vectorized and non-vectorized UDFs
       at org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$6.apply(ExtractPythonUDFs.scala:170)
       at org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$6.apply(ExtractPythonUDFs.scala:146)
       at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
       at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
       at scala.collection.immutable.List.foreach(List.scala:381)
       at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
       at scala.collection.immutable.List.map(List.scala:285)
       at org.apache.spark.sql.execution.python.ExtractPythonUDFs$.org$apache$spark$sql$execution$python$ExtractPythonUDFs$$extract(ExtractPythonUDFs.scala:146)
       at org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$apply$2.applyOrElse(ExtractPythonUDFs.scala:118)
       at org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$apply$2.applyOrElse(ExtractPythonUDFs.scala:114)
       at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$6.apply(TreeNode.scala:312)
       at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$6.apply(TreeNode.scala:312)
       at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:77)
       at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:311)
       at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309)
       at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309)
       at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$8.apply(TreeNode.scala:331)
       at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:208)
       at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:329)
       at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309)
       at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309)
       at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309)
       at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$8.apply(TreeNode.scala:331)
       at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:208)
       at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:329)
       at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309)
       at org.apache.spark.sql.execution.python.ExtractPythonUDFs$.apply(ExtractPythonUDFs.scala:114)
       at org.apache.spark.sql.execution.python.ExtractPythonUDFs$.apply(ExtractPythonUDFs.scala:94)
       at org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:113)
       at org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:113)
       at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
       at scala.collection.immutable.List.foldLeft(List.scala:84)
       at org.apache.spark.sql.execution.QueryExecution.prepareForExecution(QueryExecution.scala:113)
       at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:100)
       at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:99)
       at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3312)
       at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:2750)
       ...
      
      

        Attachments

          Activity

            People

            • Assignee:
              icexelloss Li Jin
              Reporter:
              smilegator Xiao Li
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: