[SPARK-19940] FPGrowthModel.transform should skip duplicated items - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 2.2.0
Component/s: ML
Labels:
None

Description

Due to misplaced distinct FPGrowthModel.transform generates duplicated items in the "prediction":

scala> val data = spark.read.text("data/mllib/sample_fpgrowth.txt").select(split($"value", "\\s+").alias("features")) 
data: org.apache.spark.sql.DataFrame = [features: array<string>]

scala> val data = spark.read.text("data/mllib/sample_fpgrowth.txt").select(split($"value", "\\s+").alias("features")) 
data: org.apache.spark.sql.DataFrame = [features: array<string>]

scala> fpm.transform(Seq(Array("t", "s")).toDF("features")).show(1, false)
+--------+---------------------+
|features|prediction           |
+--------+---------------------+
|[t, s]  |[y, x, z, x, y, x, z]|
+--------+---------------------+

Attachments

Issue Links

is related to

SPARK-14503 spark.ml Scala API for FPGrowth

Resolved

links to

[Github] Pull Request #17283 (zero323)

Activity

People

Assignee:: Maciej Szymkiewicz

Reporter:: Maciej Szymkiewicz

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 13/Mar/17 22:55

Updated:: 16/May/17 09:53

Resolved:: 14/Mar/17 14:35