Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33268

Fix bugs for casting data from/to PythonUserDefinedType

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.8, 3.0.2, 3.1.0
    • 2.4.8, 3.0.2, 3.1.0
    • PySpark, SQL
    • None

    Description

      This PR intends to fix bus for casting data from/to PythonUserDefinedType. A sequence of queries to reproduce this issue is as follows;

       
      >>> from pyspark.sql import Row
      >>> from pyspark.sql.functions import col
      >>> from pyspark.sql.types import *
      >>> from pyspark.testing.sqlutils import *
      >>> 
      >>> row = Row(point=ExamplePoint(1.0, 2.0))
      >>> df = spark.createDataFrame([row])
      >>> df.select(col("point").cast(PythonOnlyUDT()))
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/Users/maropu/Repositories/spark/spark-master/python/pyspark/sql/dataframe.py", line 1402, in select
          jdf = self._jdf.select(self._jcols(*cols))
        File "/Users/maropu/Repositories/spark/spark-master/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
        File "/Users/maropu/Repositories/spark/spark-master/python/pyspark/sql/utils.py", line 111, in deco
          return f(*a, **kw)
        File "/Users/maropu/Repositories/spark/spark-master/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
      py4j.protocol.Py4JJavaError: An error occurred while calling o44.select.
      : java.lang.NullPointerException
      	at org.apache.spark.sql.types.UserDefinedType.acceptsType(UserDefinedType.scala:84)
      	at org.apache.spark.sql.catalyst.expressions.Cast$.canCast(Cast.scala:96)
      	at org.apache.spark.sql.catalyst.expressions.CastBase.checkInputDataTypes(Cast.scala:267)
      	at org.apache.spark.sql.catalyst.expressions.CastBase.resolved$lzycompute(Cast.scala:290)
      	at org.apache.spark.sql.catalyst.expressions.CastBase.resolved(Cast.scala:290)}}
      

       

      A root cause of this issue is that, since PythonUserDefinedType#userClassis always null, isAssignableFrom in UserDefinedType#acceptsType throws a null exception. To fix it, this PR defines acceptsType in PythonUserDefinedType and filters out the null case in UserDefinedType#acceptsType.

      Attachments

        Activity

          People

            maropu Takeshi Yamamuro
            maropu Takeshi Yamamuro
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: