Thanks for the feedback. I agree with your thinking on tests, but I'm not sure what makes the most sense here. I have an example set where the current implementation seems to produce correct output, it just takes an unreasonable amount of time (4+ hours on a small set). I'll be happy to provide a unit test, though. Unit testing the tree manipulations on a smaller scale is problematic because the intent and invariants are not obvious.
For what it's worth, I actually spent more time than I'd like to admit trying to understand and fix the current implementation before resorting to a rewrite.
My motivation for creating an alternate implementation was that I have a real dataset (of reasonable size) where the current implementation effectively fails to complete. I've submitted a bug report, example data that demonstrates the long-execution problem, a toy example that can demonstrate evidence of tree-manipulation behavior that is almost certainly bad (but it is still not clear to me how to fix it), and I've also shown a naive implementation that completes quickly on the same data.
On the topic of top-k patterns, I do not understand why it is desirable to repeatedly mine the same candidates/patterns. The current implementation will find pattern "a b c" 3 times, when it could be found once and post-processed to create complete per-item lists. It seems needlessly inefficient (especially if you don't care about per-item lists). The way the code is structured, it is done this way even in mapreduce mode, even though the input itemsets are pruned in such a way that postprocessing is necessary anyway.
You're right that there are a lot of boxed primitives in the patch; I also didn't pack the tree into an array representation. I aimed for simple/quick to get it working; the boxed primitives in particular should be pretty easy to eliminate.