[SPARK-11569] StringIndexer transform fails when column contains nulls - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.4.0, 1.5.0, 1.6.0
Fix Version/s: 2.2.0
Component/s: ML, PySpark
Labels:
None

Description

Transforming column containing null values using StringIndexer results in java.lang.NullPointerException

from pyspark.ml.feature import StringIndexer

df = sqlContext.createDataFrame([("a", 1), (None, 2)], ("k", "v"))
df.printSchema()
## root
##  |-- k: string (nullable = true)
##  |-- v: long (nullable = true)

indexer = StringIndexer(inputCol="k", outputCol="kIdx")

indexer.fit(df).transform(df)
## <repr(<pyspark.sql.dataframe.DataFrame at 0x7f4b0d8e7110>) failed: py4j.protocol.Py4JJavaError: An error occurred while calling o75.json.
## : java.lang.NullPointerException

Problem disappears when we drop

df1 = df.na.drop()
indexer.fit(df1).transform(df1)

or replace nulls

from pyspark.sql.functions import col, when

k = col("k")
df2 = df.withColumn("k", when(k.isNull(), "__NA__").otherwise(k))
indexer.fit(df2).transform(df2)

and cannot be reproduced using Scala API

import org.apache.spark.ml.feature.StringIndexer

val df = sc.parallelize(Seq(("a", 1), (null, 2))).toDF("k", "v")
df.printSchema
// root
//  |-- k: string (nullable = true)
//  |-- v: integer (nullable = false)

val indexer = new StringIndexer().setInputCol("k").setOutputCol("kIdx")

indexer.fit(df).transform(df).count
// 2

Attachments

Issue Links

is duplicated by

SPARK-12779 StringIndexer should handle null

Closed

is related to

SPARK-19852 StringIndexer.setHandleInvalid should have another option 'new': Python API and docs

Resolved

relates to

SPARK-17498 StringIndexer.setHandleInvalid should have another option 'new'

Resolved

links to

[Github] Pull Request #9709 (jliwork)

[Github] Pull Request #9920 (jliwork)

[Github] Pull Request #17233 (crackcell)

(1 links to)

Activity

People

Assignee:: Menglong TAN

Reporter:: Maciej Szymkiewicz

Shepherd:: Joseph K. Bradley

Votes:: 3 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 07/Nov/15 23:45

Updated:: 14/Mar/17 14:51

Resolved:: 14/Mar/17 14:46