Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21658

Adds the default None for value in na.replace in PySpark to match

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: 2.2.0
    • Fix Version/s: None
    • Component/s: PySpark
    • Labels:

      Description

      Looks na.replace missed the default value None.

      Both docs says they are aliases
      http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.replace
      http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameNaFunctions.replace

      but the default values looks different, which ends up with:

      >>> df = spark.createDataFrame([('Alice', 10, 80.0)])
      >>> df.replace({"Alice": "a"}).first()
      Row(_1=u'a', _2=10, _3=80.0)
      >>> df.na.replace({"Alice": "a"}).first()
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: replace() takes at least 3 arguments (2 given)
      

      To take the advantage of SPARK-19454, sounds we should match them.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                byakuinss Chih Han Yu
                Reporter:
                hyukjin.kwon Hyukjin Kwon
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: