Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19940

FPGrowthModel.transform should skip duplicated items

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.2.0
    • 2.2.0
    • ML
    • None

    Description

      Due to misplaced distinct FPGrowthModel.transform generates duplicated items in the "prediction":

      scala> val data = spark.read.text("data/mllib/sample_fpgrowth.txt").select(split($"value", "\\s+").alias("features")) 
      data: org.apache.spark.sql.DataFrame = [features: array<string>]
      
      scala> val data = spark.read.text("data/mllib/sample_fpgrowth.txt").select(split($"value", "\\s+").alias("features")) 
      data: org.apache.spark.sql.DataFrame = [features: array<string>]
      
      scala> fpm.transform(Seq(Array("t", "s")).toDF("features")).show(1, false)
      +--------+---------------------+
      |features|prediction           |
      +--------+---------------------+
      |[t, s]  |[y, x, z, x, y, x, z]|
      +--------+---------------------+
      
      

      Attachments

        Issue Links

          Activity

            People

              zero323 Maciej Szymkiewicz
              zero323 Maciej Szymkiewicz
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: