Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-350

Generalize the SequenceFileInputFilter to apply to any InputFormat

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • None
    • None
    • None
    • None

    Description

      I'd like to generalize the SequenceFileInputFormat that was introduced in HADOOP-412 so that it can be applied to any InputFormat. To do this, I propose:

      interface WritableFilter {
      boolean accept(Writable item);
      }

      class FilterInputFormat implements InputFormat {
      ...
      }

      FilterInputFormat would look in the JobConf for:
      mapred.input.filter.source = the underlying input format
      mapred.input.filter.filters = a list of class names that implement WritableFilter

      The FilterInputFormat will work like the current SequenceFilter, but use an internal RecordReader rather than the SequenceFile. This will require adding a next(key) and getCurrentValue(value) to the RecordReader interface, but that will be addressed in a different issue.

      Attachments

        1. filtering_v2.patch
          144 kB
          Enis Soztutar
        2. filtering_v3.patch
          97 kB
          Enis Soztutar
        3. filterinputformat_v1.patch
          63 kB
          Enis Soztutar

        Issue Links

          Activity

            People

              enis Enis Soztutar
              omalley Owen O'Malley
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: