[MAPREDUCE-350] Generalize the SequenceFileInputFilter to apply to any InputFormat - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Invalid
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

I'd like to generalize the SequenceFileInputFormat that was introduced in ~~HADOOP-412~~ so that it can be applied to any InputFormat. To do this, I propose:

interface WritableFilter {
boolean accept(Writable item);
}

class FilterInputFormat implements InputFormat {
...
}

FilterInputFormat would look in the JobConf for:
mapred.input.filter.source = the underlying input format
mapred.input.filter.filters = a list of class names that implement WritableFilter

The FilterInputFormat will work like the current SequenceFilter, but use an internal RecordReader rather than the SequenceFile. This will require adding a next(key) and getCurrentValue(value) to the RecordReader interface, but that will be addressed in a different issue.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

filtering_v2.patch
17/Mar/08 17:42
144 kB
Enis Soztutar
filtering_v3.patch
18/Mar/08 07:49
97 kB
Enis Soztutar
filterinputformat_v1.patch
23/Jan/08 14:39
63 kB
Enis Soztutar

Issue Links

is blocked by

HADOOP-3048 Stringifier

Closed

Activity

People

Assignee:: Enis Soztutar

Reporter:: Owen O'Malley

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 11/Aug/06 17:59

Updated:: 16/Jan/12 09:45

Resolved:: 16/Jan/12 09:45