Hadoop Common
  1. Hadoop Common
  2. HADOOP-1204

Re-factor InputFormat/RecordReader related classes

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      This Jira is the first small step to unify the code related to the inputformat/record readers for streaming
      with the Hadoop main framework.

      This Jira does a few things to clean up the related parts in the Hadoop main framework.

      1. Add a constructor
      public LineRecordReader(Configuration job, FileSplit split)
      to LineRecordReader. This makes the constructors of both SequenceFileRecordReader and LineRecordReader
      have the same signature. This facilitates to have a factory class to create various record readers when
      we bring in the class readers classes for hadoop streaming to the main framework.

      2. Implementded next() method using the following newly added protected method to LineRecordReader class:

      protected long readLine() throws IOException

      { return LineRecordReader.readLine(in, buffer); }

      This allows the user to easily overwrite the readLine logic to use different line breaker (e.g. treat '\r' as part of data, not line breaker).

      3. Rename class InputFormatBase to FileInputFormat to better reflect the functionality of the class.
      To keep backward compatible, still keep InputFormatBase class, but make it deprecated shallow class simply inheriting FileInputFormat .

      4. Change TextInputFormat and SequenceFileFormat to extend FileInputFormat.

      1. patch-1204.txt
        19 kB
        Runping Qi

        Activity

        Hide
        Runping Qi added a comment -


        Made the proposed changes in the patch. unit tests all passed.

        Show
        Runping Qi added a comment - Made the proposed changes in the patch. unit tests all passed.
        Hide
        Hadoop QA added a comment -

        -1, because 3 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12354957/patch-1204.txt against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/525596. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch

        Show
        Hadoop QA added a comment - -1, because 3 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12354957/patch-1204.txt against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/525596 . Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
        Hide
        Runping Qi added a comment -

        My bad.
        Forgot to include the new file.

        Show
        Runping Qi added a comment - My bad. Forgot to include the new file.
        Hide
        Runping Qi added a comment -


        Added the new files to the patch

        Show
        Runping Qi added a comment - Added the new files to the patch
        Hide
        Runping Qi added a comment -

        re-submit the corrected patch

        Show
        Runping Qi added a comment - re-submit the corrected patch
        Show
        Hadoop QA added a comment - +1, because http://issues.apache.org/jira/secure/attachment/12354976/patch-1204.txt applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/525596 . Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
        Hide
        Owen O'Malley added a comment -

        This looks good, except you have some spurious whitespace differences.

        Show
        Owen O'Malley added a comment - This looks good, except you have some spurious whitespace differences.
        Hide
        Runping Qi added a comment -

        a new patch fixing a few places in comments and extra spaces

        Show
        Runping Qi added a comment - a new patch fixing a few places in comments and extra spaces
        Hide
        Runping Qi added a comment -

        one more spurious space fixed

        Show
        Runping Qi added a comment - one more spurious space fixed
        Hide
        Runping Qi added a comment -

        a new patch fixing some comments and spurious spaces

        Show
        Runping Qi added a comment - a new patch fixing some comments and spurious spaces
        Hide
        Hadoop QA added a comment -

        -1

        2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12355268/patch-1204.txt against trunk revision r527100.

        Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

        Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/23/console

        Show
        Hadoop QA added a comment - -1 2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12355268/patch-1204.txt against trunk revision r527100. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/23/console
        Hide
        Runping Qi added a comment -

        I took a look at the log.
        It seems to me that the failure has nothing to do with this patch.

        The early version version of this patch passed the tests (see above). I fixed a few spurous spaces and resubmitted it yesterday. All the unit tests passed on my dev machine.

        Since I have another fairly large patch depending on this patch, can somebody help to verify this patch and, if no problems found, commit it? Thanks a lot.

        Show
        Runping Qi added a comment - I took a look at the log. It seems to me that the failure has nothing to do with this patch. The early version version of this patch passed the tests (see above). I fixed a few spurous spaces and resubmitted it yesterday. All the unit tests passed on my dev machine. Since I have another fairly large patch depending on this patch, can somebody help to verify this patch and, if no problems found, commit it? Thanks a lot.
        Hide
        Owen O'Malley added a comment -

        +1

        Show
        Owen O'Malley added a comment - +1
        Hide
        Doug Cutting added a comment -

        I just committed this. Thanks, Runping!

        Show
        Doug Cutting added a comment - I just committed this. Thanks, Runping!
        Hide
        Hadoop QA added a comment -
        Show
        Hadoop QA added a comment - Integrated in Hadoop-Nightly #55 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/55/ )

          People

          • Assignee:
            Runping Qi
            Reporter:
            Runping Qi
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development