XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.0
    • SQL
    • None

    Description

      Finding frequent items with possibly false positives, using the algorithm described in http://www.cs.umd.edu/~samir/498/karp.pdf.

      df.stat.freqItems(cols: Array[String], support: Double = 0.001): DataFrame
      

      The output is a local DataFrame having the input column names. In the first version, we will implement the single pass algorithm that may return false positives, but no false negatives.

      Attachments

        Activity

          People

            brkyvz Burak Yavuz
            mengxr Xiangrui Meng
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: