Details
-
Umbrella
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
0.8.1
-
None
-
None
Description
So I am curious what the plan is for the longterm future of MRUNIT?
I think currently MRUNIT is useful for just unit testing a single mapper or reducer but currently there is a void for testing more complicated features such as MultipleInputs, MultipleOutputs, a driver class, counters, among other things. I wonder if instead of adding support to the current MRUNIT framework for these extra features it would more useful to add in hooks to the existing LocalJobRunner and MiniMRCluster classes to provide methods to more easily verify file output from text files, sequence files, etc. This would allow MRUNIT to test driver classes, MultipleInputs, MultipleOutputs, etc. MRUNIT would also then test against the real hadoop code instead of an implementation that mimics hadoop which can miss some bugs such as the ReduceDriver that did not reuse the same object until 0.8.0. MRUNIT would also keep up with new map reduce features instead of us having to implement fake versions of them
I understand that performance would be an issue due to the file I/O but I wonder how fast the LocalJobRunner would be if we wrote a new class that extending FileSystem to allow users to write out fake files to memory and make the LocalJobRunner read from them
Attachments
1.
|
create new branch for the new api | Resolved | Jim Donofrio | |
2.
|
add this branch to jenkins | Resolved | Brock Noland | |
3.
|
determine package structure | Open | Unassigned | |
4.
|
run a generic Map Reduce driver (via local job runner) and validating the output | Open | Unassigned | |
5.
|
merge back into trunk, deprecate old api | Open | Unassigned |