[MAHOUT-890] Performance issue in FPGrowth - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.6
Fix Version/s: 0.6
Component/s: None
Labels:
None

Description

I've encountered a dataset which indicates there is probably a performance bug lurking in the FPGrowth implementation. This set may be a bit of an unusual target for FPG - there's a relatively modest number itemsets, and many items with a Zipfy distribution. I am attaching a patch (addSynth.patch) to add a similar dataset as core/src/test/resources/FPGsynth.dat.

FPGsynth.dat can take minutes or a few hours to process, depending on how it is grouped out to machines. If run in sequential mode, or with "-g 50" it will take considerable time. Most reducers/"anchor items" are processed quickly, but a small number take a handful of minutes, and one or two take a long time. If you experiment with this data, I suggest using '-s 50 -regex "[ ]+"'.

Digging into this, I've found that the tree pruning code sometimes creates surprising trees. One oddity I've observed is 0-count nodes, sometimes with non-zero children. The other is that sometimes subtrees seem to get repeated. I'm attaching a sample input file (smallexample.dat, use the whitespace regex with this one, too) and a patch which adds some logging in pruneFPTree and growthBottomUp which will print out some interesting trees when run with the smallexample.dat input.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

smallexample.dat
20/Nov/11 03:41
0.1 kB
Tom Pierce
simpleFPG.patch
22/Nov/11 22:16
38 kB
Tom Pierce
MAHOUT-890-3.patch
31/Dec/11 01:22
713 kB
Tom Pierce
MAHOUT-890-2.patch
30/Dec/11 21:38
103 kB
Tom Pierce
MAHOUT-890.patch
03/Dec/11 23:20
91 kB
Tom Pierce
logtrees.patch
20/Nov/11 03:41
1 kB
Tom Pierce
addSynth.patch
20/Nov/11 03:41
588 kB
Tom Pierce

Issue Links

relates to

MAHOUT-920 Remove a mapreduce job from parallel FPGrowth workflow

Closed

Activity

People

Assignee:: Robin Anil

Reporter:: Tom Pierce

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 20/Nov/11 03:39

Updated:: 31/Mar/15 22:49

Resolved:: 17/Jan/12 05:20