Type: New Feature
Affects Version/s: None
Fix Version/s: v1.4.0
SpaceSaving (TopN algorithm) code could copy from https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/StreamSummary.java
We don’t need the whole stream-lib, but just one (or two) classes is enough. Make sure you give credit to stream-lib in class comment.
In order to run SpaceSaving in parallel, the TopN has to be merged using http://arxiv.org/pdf/1401.0702.pdf. No existing impl as I searched, we have to implement ourselves.
From: Li, Yang
Sent: 2015年8月7日 12:43
Subject: Distributed TopN papers
The basic algorithm
Its application in distributed system