Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23269

FP-growth: Provide last transaction for each detected frequent pattern

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: 2.2.1
    • Fix Version/s: None
    • Component/s: ML
    • Labels:

      Description

      FP-growth implementation gives patterns and their frequences:

      model.freqItemsets:

      items freq
      [5] 3
      [5, 1] 3

      It would be great to know when each pattern occurred last time - what is the last transaction having this pattern?

      To do so, it will be necessary to tell FPGrowth what is the timestamp column in the transactions data frame:

      val fpgrowth = new FPGrowth()
        .setItemsCol("items")
        .setTimestampCol("timestamp")
      

      So the data frame with patterns could look like:

      items freq lastOccurrence
      [5] 3 2018-01-01 12:15:00
      [5, 1] 3 2018-01-01 12:15:00

      Without this functionality, it is necessary to traverse the transactions data frame with the set of detected patterns and determine the last transaction for each pattern. Why traverse transactions once again if it has been already done in FP-growth execution?

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              tashoyan Arseniy Tashoyan
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 120h
                120h
                Remaining:
                Remaining Estimate - 120h
                120h
                Logged:
                Time Spent - Not Specified
                Not Specified