[SPARK-30082] Zeros are being treated as NaNs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4
Fix Version/s: 2.4.5, 3.0.0
Component/s: SQL
Labels:
- correctness

Description

If you attempt to run

df = df.replace(float('nan'), somethingToReplaceWith)

It will replace all 0 s in columns of type Integer

Example code snippet to repro this:

from pyspark.sql import SQLContext
spark = SQLContext(sc).sparkSession
df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
df.show()
df = df.replace(float('nan'), 5)
df.show()

Here's the output I get when I run this code:

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.4
      /_/

Using Python version 3.7.5 (default, Nov  1 2019 02:16:32)
SparkSession available as 'spark'.
>>> from pyspark.sql import SQLContext
>>> spark = SQLContext(sc).sparkSession
>>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
>>> df.show()
+-----+-----+
|index|value|
+-----+-----+
|    1|    0|
|    2|    3|
|    3|    0|
+-----+-----+

>>> df = df.replace(float('nan'), 5)
>>> df.show()
+-----+-----+
|index|value|
+-----+-----+
|    1|    5|
|    2|    3|
|    3|    5|
+-----+-----+

>>>

Attachments

Issue Links

links to

GitHub Pull Request #26738

GitHub Pull Request #26749

Activity

People

Assignee:: John Ayad

Reporter:: John Ayad

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 29/Nov/19 13:19

Updated:: 12/Dec/22 18:11

Resolved:: 03/Dec/19 16:07