Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25666

Internally document type conversion between Python data and SQL types in UDFs

    XMLWordPrintableJSON

Details

    • Documentation
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.4.0
    • 3.0.0
    • PySpark
    • None

    Description

      Currently, UDF's type coercion is not cleanly defined. See also https://github.com/apache/spark/pull/22610 and https://github.com/apache/spark/pull/22610

      This JIRA targets to describe the type conversion logic internally. For instance:

          +---------------+----+----+----+----+----+------------------------------+------------------------------+----+---------------+---------+----------------------------+------------+----+------------+---------+---------+  # noqa
          |   Type \ Value|None|True|   1|   a|   a|                    1970-01-01|           1970-01-01 00:00:00| 1.0|array('i', [1])|      [1]|                        (1,)|         ABC|   1|    {'a': 1}| Row(a=1)| Row(a=1)|  # noqa
          +---------------+----+----+----+----+----+------------------------------+------------------------------+----+---------------+---------+----------------------------+------------+----+------------+---------+---------+  # noqa
          |           null|None|None|None|None|None|                          None|                          None|None|           None|     None|                        None|        None|None|        None|        X|        X|  # noqa
          |        boolean|None|True|None|None|None|                          None|                          None|None|           None|     None|                        None|        None|None|        None|        X|        X|  # noqa
          |        tinyint|None|None|   1|None|None|                          None|                          None|None|           None|     None|                        None|        None|None|        None|        X|        X|  # noqa
          |       smallint|None|None|   1|None|None|                          None|                          None|None|           None|     None|                        None|        None|None|        None|        X|        X|  # noqa
          |            int|None|None|   1|None|None|                          None|                          None|None|           None|     None|                        None|        None|None|        None|        X|        X|  # noqa
          |         bigint|None|None|   1|None|None|                          None|                          None|None|           None|     None|                        None|        None|None|        None|        X|        X|  # noqa
          |         string|None|true|   1|   a|   a|java.util.GregorianCalendar...|java.util.GregorianCalendar...| 1.0|    [I@2d03fe27|      [1]|[Ljava.lang.Object;@5ae74a34| [B@6e96d01e|   1|       {a=1}|        X|        X|  # noqa
          |           date|None|   X|   X|   X|   X|                    1970-01-01|                    1970-01-01|   X|              X|        X|                           X|           X|   X|           X|        X|        X|  # noqa
          |      timestamp|None|   X|   X|   X|   X|                             X|           1970-01-01 00:00:00|   X|              X|        X|                           X|           X|   X|           X|        X|        X|  # noqa
          |          float|None|None|None|None|None|                          None|                          None| 1.0|           None|     None|                        None|        None|None|        None|        X|        X|  # noqa
          |         double|None|None|None|None|None|                          None|                          None| 1.0|           None|     None|                        None|        None|None|        None|        X|        X|  # noqa
          |     array<int>|None|None|None|None|None|                          None|                          None|None|            [1]|      [1]|                         [1]|[65, 66, 67]|None|        None|        X|        X|  # noqa
          |         binary|None|None|None|   a|   a|                          None|                          None|None|           None|     None|                        None|         ABC|None|        None|        X|        X|  # noqa
          |  decimal(10,0)|None|None|None|None|None|                          None|                          None|None|           None|     None|                        None|        None|   1|        None|        X|        X|  # noqa
          |map<string,int>|None|None|None|None|None|                          None|                          None|None|           None|     None|                        None|        None|None|   {u'a': 1}|        X|        X|  # noqa
          | struct<_1:int>|None|   X|   X|   X|   X|                             X|                             X|   X|              X|Row(_1=1)|                   Row(_1=1)|           X|   X|Row(_1=None)|Row(_1=1)|Row(_1=1)|  # noqa
          +---------------+----+----+----+----+----+------------------------------+------------------------------+----+---------------+---------+----------------------------+------------+----+------------+---------+---------+  # noqa
      

      Attachments

        Issue Links

          Activity

            People

              gurwls223 Hyukjin Kwon
              gurwls223 Hyukjin Kwon
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: