Details
Description
Current implementation suffers from following issues:
- It is possible to use dict as to_replace, but we cannot skip or use None as the value value (although it is ignored). This requires passing "magic" values:
df = sc.parallelize([("Alice", 1, 3.0)]).toDF() df.replace({"Alice": "Bob"}, 1)
- Code doesn't check if provided types are correct. This can lead to exception in Py4j (harder to diagnose):
df.replace({"Alice": 1}, 1)
or silent failures (with bundled Py4j version):
df.replace({1: 2, 3.0: 4.1, "a": "b"}, 1)
Attachments
Issue Links
- is related to
-
SPARK-19453 Correct DataFrame.replace docs
- Resolved
- links to