Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33268

Fix bugs for casting data from/to PythonUserDefinedType

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.8, 3.0.2, 3.1.0
    • Fix Version/s: 2.4.8, 3.0.2, 3.1.0
    • Component/s: PySpark, SQL
    • Labels:
      None

      Description

      This PR intends to fix bus for casting data from/to PythonUserDefinedType. A sequence of queries to reproduce this issue is as follows;

       
      >>> from pyspark.sql import Row
      >>> from pyspark.sql.functions import col
      >>> from pyspark.sql.types import *
      >>> from pyspark.testing.sqlutils import *
      >>> 
      >>> row = Row(point=ExamplePoint(1.0, 2.0))
      >>> df = spark.createDataFrame([row])
      >>> df.select(col("point").cast(PythonOnlyUDT()))
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/Users/maropu/Repositories/spark/spark-master/python/pyspark/sql/dataframe.py", line 1402, in select
          jdf = self._jdf.select(self._jcols(*cols))
        File "/Users/maropu/Repositories/spark/spark-master/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
        File "/Users/maropu/Repositories/spark/spark-master/python/pyspark/sql/utils.py", line 111, in deco
          return f(*a, **kw)
        File "/Users/maropu/Repositories/spark/spark-master/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
      py4j.protocol.Py4JJavaError: An error occurred while calling o44.select.
      : java.lang.NullPointerException
      	at org.apache.spark.sql.types.UserDefinedType.acceptsType(UserDefinedType.scala:84)
      	at org.apache.spark.sql.catalyst.expressions.Cast$.canCast(Cast.scala:96)
      	at org.apache.spark.sql.catalyst.expressions.CastBase.checkInputDataTypes(Cast.scala:267)
      	at org.apache.spark.sql.catalyst.expressions.CastBase.resolved$lzycompute(Cast.scala:290)
      	at org.apache.spark.sql.catalyst.expressions.CastBase.resolved(Cast.scala:290)}}
      

       

      A root cause of this issue is that, since PythonUserDefinedType#userClassis always null, isAssignableFrom in UserDefinedType#acceptsType throws a null exception. To fix it, this PR defines acceptsType in PythonUserDefinedType and filters out the null case in UserDefinedType#acceptsType.

        Attachments

          Activity

            People

            • Assignee:
              maropu Takeshi Yamamuro
              Reporter:
              maropu Takeshi Yamamuro
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: