I wanted to take a stab at adding the missing parallel tests that Joel
alluded to in his most recent comment.
When I went to pull it down though, I realized that this patch no longer
applies cleanly on top of the recent changes to ReduceOperation/ReducerStream.
To main highlights of the recent ReducerStream changes are:
1.) ReducerStream now requires a ReducerOperation.
2.) (Currently), the only ReducerOperation implementation is GroupOperation
3.) GroupOperation requires a StreamComparator, and an int 'size'. The
size is used to limit the number of tuples to hold on to in each grouping.
When the upper bound is reached, the least tuple is dropped (according to the
4.) The only StreamComparator implementations are FieldComparator, and
MultiFieldComparator, both of which require a field name.
The net effect of these changes is that IntersectStream and ComplementStream need
a field name at creation time (because they rely on ReducerStream, which relies on
As I see it, IntersectStream and ComplementStream shouldn't need
this chain of objects. AFAICT, since their job is to do logical operations,
it'd be wrong for their internal ReducerStream to drop tuples based on an
arbitrary limit. And since we don't want to drop tuples, there's no need for a
Two resolutions come to mind here:
1.) Modify GroupOperation so that the 'size' (and comparator) can be optional.
2.) Create a no-op StreamComparator, or one that always returns "equal", to pass
into the existing GroupOperation.
I'm leaning towards the first option. It seems more generally useful, and creating
a no-op class seems like a bit of a hack.
Anyone have opinions/thoughts on this? Have I missed something obvious/simple here,
or misread the code entirely? Is there another option to resolve this conflict that
In any case, just wanted to get some feedback on the best way to resolve this change
before I move onto actually adding the new tests.