Details
-
Improvement
-
Status: Resolved
-
Trivial
-
Resolution: Incomplete
-
None
-
None
Description
val denseData = Array(
Vectors.dense(3.8, 0.0, 1.8),
Vectors.dense(1.7, 0.9, 0.0),
Vectors.dense(Double.NaN, 0, 0.0)
)
val rdd = sc.parallelize(denseData)
println(Statistics.colStats(rdd).mean)
[NaN,0.3,0.6]
This is just a proposal for discussion on how to handle the NaN value in the vectors. We can ignore the NaN value in the computation or just output NaN as it is now as a warning.
Attachments
Issue Links
- is related to
-
SPARK-13568 Create feature transformer to impute missing values
- Resolved