Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15809

PySpark SQL UDF default returnType

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • None
    • None
    • PySpark

    Description

      The current signature for the pyspark UDF creation function is:

      pyspark.sql.functions.udf(f, returnType=StringType)
      

      Is there a reason that there's a default parameter for returnType? Returning a string by default doesn't strike me as so much more a frequent use case than, say, returning an integer to merit the default.

      In fact, it seems the only reason that the default was chosen is that if we had to choose a default type, it would be a StringType because that's what we can implicitly convert everything to.

      But this only seems to do two things to me: (1) cause unintentional, annoying conversions to strings for new users and (2) make call sites less consistent (if people drop the type specification to actually use the default).

      Attachments

        Activity

          People

            Unassigned Unassigned
            vlad.feinberg Vladimir Feinberg
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: