[SPARK-11725] Let UDF to handle null value - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.6.0
Component/s: SQL
Labels:
- releasenotes

Target Version/s:

1.6.0

Description

I notice that currently spark will take the long field as -1 if it is null.
Here's the sample code.

sqlContext.udf.register("f", (x:Int)=>x+1)
df.withColumn("age2", expr("f(age)")).show()

//////////////// Output ///////////////////////
+----+-------+----+
| age|   name|age2|
+----+-------+----+
|null|Michael|   0|
|  30|   Andy|  31|
|  19| Justin|  20|
+----+-------+----+

I think for the null value we have 3 options

Use a special value to represent it (what spark does now)
Always return null if the udf input has null value argument
Let udf itself to handle null

I would prefer the third option

Attachments

Issue Links

relates to

SPARK-20212 UDFs with Option[Primitive Type] don't work as expected

Resolved

links to

[Github] Pull Request #9770 (cloud-fan)

Activity

People

Assignee:: Wenchen Fan

Reporter:: Jeff Zhang

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 13/Nov/15 10:24

Updated:: 05/Apr/17 08:23

Resolved:: 18/Nov/15 18:23