Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-350

Generalize the SequenceFileInputFilter to apply to any InputFormat

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Invalid
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I'd like to generalize the SequenceFileInputFormat that was introduced in HADOOP-412 so that it can be applied to any InputFormat. To do this, I propose:

      interface WritableFilter {
      boolean accept(Writable item);
      }

      class FilterInputFormat implements InputFormat {
      ...
      }

      FilterInputFormat would look in the JobConf for:
      mapred.input.filter.source = the underlying input format
      mapred.input.filter.filters = a list of class names that implement WritableFilter

      The FilterInputFormat will work like the current SequenceFilter, but use an internal RecordReader rather than the SequenceFile. This will require adding a next(key) and getCurrentValue(value) to the RecordReader interface, but that will be addressed in a different issue.

        Attachments

        1. filtering_v3.patch
          97 kB
          Enis Soztutar
        2. filtering_v2.patch
          144 kB
          Enis Soztutar
        3. filterinputformat_v1.patch
          63 kB
          Enis Soztutar

          Issue Links

            Activity

              People

              • Assignee:
                enis Enis Soztutar
                Reporter:
                owen.omalley Owen O'Malley
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: