Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
Description
I notice that currently spark will take the long field as -1 if it is null.
Here's the sample code.
sqlContext.udf.register("f", (x:Int)=>x+1) df.withColumn("age2", expr("f(age)")).show() //////////////// Output /////////////////////// +----+-------+----+ | age| name|age2| +----+-------+----+ |null|Michael| 0| | 30| Andy| 31| | 19| Justin| 20| +----+-------+----+
I think for the null value we have 3 options
- Use a special value to represent it (what spark does now)
- Always return null if the udf input has null value argument
- Let udf itself to handle null
I would prefer the third option
Attachments
Issue Links
- relates to
-
SPARK-20212 UDFs with Option[Primitive Type] don't work as expected
- Resolved
- links to