Details
Description
The behavior of the JSON option allowNonNumericNumbers is not consistent:
1. Some NaN and Infinity values are still parsed when the option is set to false
2. Some values are parsed differently depending on whether they are quoted or not (see results for positive and negative Infinity)
Input data
{ "number": "NaN" } { "number": NaN } { "number": "+INF" } { "number": +INF } { "number": "-INF" } { "number": -INF } { "number": "INF" } { "number": INF } { "number": Infinity } { "number": +Infinity } { "number": -Infinity } { "number": "Infinity" } { "number": "+Infinity" } { "number": "-Infinity" }
Setup
import org.apache.spark.sql.types._ val schema = StructType(Seq(StructField("number", DataTypes.FloatType, false)))
allowNonNumericNumbers = false
spark.read.format("json").schema(schema).option("allowNonNumericNumbers", "false").json("nan_valid.json") df.show +---------+ | number| +---------+ | NaN| | null| | null| | null| | null| | null| | null| | null| | null| | null| | null| | Infinity| | null| |-Infinity| +---------+
allowNonNumericNumbers = true
val df = spark.read.format("json").schema(schema).option("allowNonNumericNumbers", "true").json("nan_valid.json") df.show +---------+ | number| +---------+ | NaN| | NaN| | null| | Infinity| | null| |-Infinity| | null| | null| | Infinity| | Infinity| |-Infinity| | Infinity| | null| |-Infinity| +---------+