Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41661 Support for User-defined Functions in Python
  3. SPARK-42428

Standardize __repr__ of CommonInlineUserDefinedFunction

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • Connect, PySpark
    • None

    Description

      As shown below, `f(df.id)` is evaluated to a Column with a super long representation in Connect, however, the vanilla PySpark returns `Column<'<lambda>(id)'>`. We shall Standardize _repr_ of CommonInlineUserDefinedFunction.

      >>> f = udf(lambda x : x + 1)
      >>> df.id
      Column<'id'>
      >>> f(df.id)
      Column<'<lambda>(id), True, "string", 100, b'\x80\x05\x95\xe1\x01\x00\x00\x00\x00\x00\x00\x8c\x1fpyspark.cloudpickle.cloudpickle\x94\x8c\x0e_make_function\x94\x93\x94(h\x00\x8c\r_builtin_type\x94\x93\x94\x8c\x08CodeType\x94\x85\x94R\x94(K\x01K\x00K\x00K\x01K\x02KCC\x08|\x00d\x01\x17\x00S\x00\x94NK\x01\x86\x94)\x8c\x01x\x94\x85\x94\x8c\x07<stdin>\x94\x8c\x08<lambda>\x94K\x01C\x00\x94))t\x94R\x94}\x94(\x8c\x0b__package__\x94N\x8c\x08__name__\x94\x8c\x08__main__\x94uNNNt\x94R\x94\x8c$pyspark.cloudpickle.cloudpickle_fast\x94\x8c\x12_function_setstate\x94\x93\x94h\x16}\x94}\x94(h\x13h\r\x8c\x0c__qualname__\x94h\r\x8c\x0f__annotations__\x94}\x94\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h\x14\x8c\x07__doc__\x94N\x8c\x0b__closure__\x94N\x8c\x17_cloudpickle_submodules\x94]\x94\x8c\x0b__globals__\x94}\x94u\x86\x94\x86R0\x8c\x11pyspark.sql.types\x94\x8c\nStringType\x94\x93\x94)\x81\x94\x86\x94.', f3.9'>
      

      Attachments

        Activity

          People

            XinrongM Xinrong Meng
            XinrongM Xinrong Meng
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: