I had suggested we not use both org.apache.mahout.common and org.apache.mahout.utils as the "common stuff" package, since that's redundant. We sort of standardized on common, but, retained utils for various reasons. I think this belongs in core/ and under .common
The Random shouldn't be instance variable right, and should be obtained from RandomUtils?
I like having it be injectable for testing purposes. As long as it exhibits the same interface as j.u.Random, we should be fine. There may be a better interface from RandomUtils. Feel free to suggest one, but I really do want to keep the injectability of the generator.
It's not necessary to keep the original Iterator since as you show, you really must sample it all upfront as you do. In this sense it's almost not properly a class that should produce an Iterator, but a List, but, I like the tidiness of an Iterator wrapper.
This is a point I waffled on. The real question here is whether we care about the corner case where we don't read anything from the iterator. I went slightly nuts and decided I did care to optimize that point, but you make a strong counter argument that the class could be simpler if copyInput were called from the constructor. That would simplify testing as well.
Consider providing an Iterable counterpart for easy use with foreach loops, like I did with SamplingIterable
Name it something ending with Iterator since it's an Iterator? FixedSizeSampleIterator?
Also a fine idea.
Are methods like copyInput() necessarily public, and is there a need to set the generator?
They could be package level. I merely exposed it to be able to do more detailed testing. This adds weight to your argument about keeping the original iterator.
Very picky, usually see test cases end in TestCase
I usually see test cases that start or end with Test. It is an old convention from many ant builds that required regexes. I don't much care except that I would have a small preference for making abstract tests end in TestCase in order to distinguish them from concrete tests.
If you agree with these but don't care t implement, I can do so.
Let me take one more crack.