Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Invalid
-
None
-
None
-
None
-
None
Description
I'd like to generalize the SequenceFileInputFormat that was introduced in HADOOP-412 so that it can be applied to any InputFormat. To do this, I propose:
interface WritableFilter {
boolean accept(Writable item);
}
class FilterInputFormat implements InputFormat {
...
}
FilterInputFormat would look in the JobConf for:
mapred.input.filter.source = the underlying input format
mapred.input.filter.filters = a list of class names that implement WritableFilter
The FilterInputFormat will work like the current SequenceFilter, but use an internal RecordReader rather than the SequenceFile. This will require adding a next(key) and getCurrentValue(value) to the RecordReader interface, but that will be addressed in a different issue.
Attachments
Attachments
Issue Links
- is blocked by
-
HADOOP-3048 Stringifier
- Closed