Description
Looks na.replace missed the default value None.
Both docs says they are aliases
http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.replace
http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameNaFunctions.replace
but the default values looks different, which ends up with:
>>> df = spark.createDataFrame([('Alice', 10, 80.0)]) >>> df.replace({"Alice": "a"}).first() Row(_1=u'a', _2=10, _3=80.0) >>> df.na.replace({"Alice": "a"}).first() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: replace() takes at least 3 arguments (2 given)
To take the advantage of SPARK-19454, sounds we should match them.
Attachments
Issue Links
- duplicates
-
SPARK-23328 Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary
- Resolved
- links to