For a variety of mining algorithms, it helps to have a uniform way to only process a sub-set of the records in a reducer.
As such, I have written a simple generic sampler that filters an Iterator returning a fair sample of at most a specified size.
|Status||Resolved [ 5 ]||Closed [ 6 ]|
|Status||Patch Available [ 10002 ]||Resolved [ 5 ]|
|Resolution||Fixed [ 1 ]|
|Assignee||Ted Dunning [ tdunning ]||Sean Owen [ srowen ]|
|Status||Open [ 1 ]||Patch Available [ 10002 ]|
|Field||Original Value||New Value|
|Assignee||Ted Dunning [ tdunning ]|