[SPARK-21658] Adds the default None for value in na.replace in PySpark to match - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Duplicate
Affects Version/s: 2.2.0
Fix Version/s: None
Component/s: PySpark
Labels:
- Starter

Description

Looks na.replace missed the default value None.

Both docs says they are aliases
http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.replace
http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html#pyspark.sql.DataFrameNaFunctions.replace

but the default values looks different, which ends up with:

>>> df = spark.createDataFrame([('Alice', 10, 80.0)])
>>> df.replace({"Alice": "a"}).first()
Row(_1=u'a', _2=10, _3=80.0)
>>> df.na.replace({"Alice": "a"}).first()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: replace() takes at least 3 arguments (2 given)

To take the advantage of ~~SPARK-19454~~, sounds we should match them.

Attachments

Issue Links

duplicates

SPARK-23328 Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary

Resolved

links to

[Github] Pull Request #18895 (byakuinss)

[Github] Pull Request #20496 (HyukjinKwon)

Activity

People

Assignee:: Chih Han Yu

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 07/Aug/17 20:30

Updated:: 12/Dec/22 18:11

Resolved:: 03/Feb/18 19:05