Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-625

Some of generated patterns have support higher than in reality

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.4
    • Fix Version/s: 0.5
    • Component/s: None
    • Labels:
      None

      Description

      It turnes out that some of generated patterns have incorrect support. The returned support is slightly higher than the true one.
      I attached the test, which proves that FPGrowth has a bug. Test is using data (retail) found here: http://fimi.ua.ac.be/data/
      The pattern (36, 39, 41) occurs in the transactions 572 times (this is also calculated in test), but the FPGrowth returns pattern (36, 39, 41) with support 573.

      Please note that mentioned pattern is not the only one with incorrect support - the test only point out one example to hace something to focus on. There is plenty more patterns with support higher than the real one. The biggest difference I noticed was support 8 higher than the real one for one of patterns.

      Please find attached failing unit test - it's actually a maven project, which contains test data and is ready to run.

        Attachments

        1. FPGrowth.java
          34 kB
          niu
        2. final_patch_with_bug_fix_test_and_the_dataset.txt
          4.17 MB
          Jaroslaw Odzga
        3. dataset_ok.txt
          1 kB
          Jaroslaw Odzga
        4. bugfix-patch.txt
          0.8 kB
          Jaroslaw Odzga
        5. MAHOUT-625-patch.txt
          4.18 MB
          Jaroslaw Odzga
        6. mahout-test.zip
          1.47 MB
          Jaroslaw Odzga

          Activity

            People

            • Assignee:
              robinanil Robin Anil
              Reporter:
              jarek Jaroslaw Odzga
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: