I notice that currently spark will take the long field as -1 if it is null.
Here's the sample code.
I think for the null value we have 3 options
- Use a special value to represent it (what spark does now)
- Always return null if the udf input has null value argument
- Let udf itself to handle null
I would prefer the third option