Hadoop Common
  1. Hadoop Common
  2. HADOOP-1204

Re-factor InputFormat/RecordReader related classes

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      This Jira is the first small step to unify the code related to the inputformat/record readers for streaming
      with the Hadoop main framework.

      This Jira does a few things to clean up the related parts in the Hadoop main framework.

      1. Add a constructor
      public LineRecordReader(Configuration job, FileSplit split)
      to LineRecordReader. This makes the constructors of both SequenceFileRecordReader and LineRecordReader
      have the same signature. This facilitates to have a factory class to create various record readers when
      we bring in the class readers classes for hadoop streaming to the main framework.

      2. Implementded next() method using the following newly added protected method to LineRecordReader class:

      protected long readLine() throws IOException

      { return LineRecordReader.readLine(in, buffer); }

      This allows the user to easily overwrite the readLine logic to use different line breaker (e.g. treat '\r' as part of data, not line breaker).

      3. Rename class InputFormatBase to FileInputFormat to better reflect the functionality of the class.
      To keep backward compatible, still keep InputFormatBase class, but make it deprecated shallow class simply inheriting FileInputFormat .

      4. Change TextInputFormat and SequenceFileFormat to extend FileInputFormat.

      1. patch-1204.txt
        19 kB
        Runping Qi

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Runping Qi
            Reporter:
            Runping Qi
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development