[SPARK-42428] Standardize __repr__ of CommonInlineUserDefinedFunction - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.4.0
Fix Version/s: 3.4.0
Component/s: Connect, PySpark
Labels:
None

Description

As shown below, `f(df.id)` is evaluated to a Column with a super long representation in Connect, however, the vanilla PySpark returns `Column<'<lambda>(id)'>`. We shall Standardize _repr_ of CommonInlineUserDefinedFunction.

>>> f = udf(lambda x : x + 1)
>>> df.id
Column<'id'>
>>> f(df.id)
Column<'<lambda>(id), True, "string", 100, b'\x80\x05\x95\xe1\x01\x00\x00\x00\x00\x00\x00\x8c\x1fpyspark.cloudpickle.cloudpickle\x94\x8c\x0e_make_function\x94\x93\x94(h\x00\x8c\r_builtin_type\x94\x93\x94\x8c\x08CodeType\x94\x85\x94R\x94(K\x01K\x00K\x00K\x01K\x02KCC\x08|\x00d\x01\x17\x00S\x00\x94NK\x01\x86\x94)\x8c\x01x\x94\x85\x94\x8c\x07<stdin>\x94\x8c\x08<lambda>\x94K\x01C\x00\x94))t\x94R\x94}\x94(\x8c\x0b__package__\x94N\x8c\x08__name__\x94\x8c\x08__main__\x94uNNNt\x94R\x94\x8c$pyspark.cloudpickle.cloudpickle_fast\x94\x8c\x12_function_setstate\x94\x93\x94h\x16}\x94}\x94(h\x13h\r\x8c\x0c__qualname__\x94h\r\x8c\x0f__annotations__\x94}\x94\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h\x14\x8c\x07__doc__\x94N\x8c\x0b__closure__\x94N\x8c\x17_cloudpickle_submodules\x94]\x94\x8c\x0b__globals__\x94}\x94u\x86\x94\x86R0\x8c\x11pyspark.sql.types\x94\x8c\nStringType\x94\x93\x94)\x81\x94\x86\x94.', f3.9'>

Attachments

Issue Links

links to

[Github] Pull Request #40003 (xinrong-meng)

Activity

People

Assignee:: Xinrong Meng

Reporter:: Xinrong Meng

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 14/Feb/23 03:14

Updated:: 14/Feb/23 09:05

Resolved:: 14/Feb/23 09:05