Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
None
-
None
Description
The current signature for the pyspark UDF creation function is:
pyspark.sql.functions.udf(f, returnType=StringType)
Is there a reason that there's a default parameter for returnType? Returning a string by default doesn't strike me as so much more a frequent use case than, say, returning an integer to merit the default.
In fact, it seems the only reason that the default was chosen is that if we had to choose a default type, it would be a StringType because that's what we can implicitly convert everything to.
But this only seems to do two things to me: (1) cause unintentional, annoying conversions to strings for new users and (2) make call sites less consistent (if people drop the type specification to actually use the default).