[SPARK-27052] Using PySpark udf in transform yields NULL values - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.4.0
Fix Version/s: None
Component/s: PySpark, SQL
Labels:
- bulk-closed

Description

Steps to reproduce


from typing import Optional
from pyspark.sql.functions import expr

def f(x: Optional[int]) -> Optional[int]:
    return x + 1 if x is not None else None

spark.udf.register('f', f, "integer")

df = (spark
    .createDataFrame([(1, [1, 2, 3])], ("id", "xs"))
    .withColumn("xsinc", expr("transform(xs, x -> f(x))")))

df.show()

# +---+---------+-----+
# | id|       xs|xsinc|
# +---+---------+-----+
# |  1|[1, 2, 3]| [,,]|
# +---+---------+-----+

Source https://stackoverflow.com/a/53762650

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: hejsgpuom62c

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 04/Mar/19 21:45

Updated:: 12/Dec/22 18:10

Resolved:: 25/May/21 01:44