Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1122

streaming with custom input format does not support the new API

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.20.1
    • Fix Version/s: None
    • Component/s: contrib/streaming
    • Labels:
      None
    • Environment:

      any OS

    • Hadoop Flags:
      Incompatible change

      Description

      When trying to implement a custom input format for use with streaming, I have found that streaming does not support the new API, org.apache.hadoop.mapreduce.InputFormat, but requires the old API, org.apache.hadoop.mapred.InputFormat.

      1. patch-1122.txt
        236 kB
        Amareshwari Sriramadasu
      2. patch-1122-1.txt
        271 kB
        Amareshwari Sriramadasu

        Issue Links

          Activity

          Hide
          Jaideep added a comment -

          Some changes that are needed in order to support this.

          • Everywhere in StreamJob, o.a.h.mapred.JobConf is used. To allow
            new input and output formats, new o.a.h.mapreduce.Job object should be
            used instead. Alternatively we can create and set configuration without
            relying on JobConf or Job methods, and only create a JobConf or Job
            object depending upon whether old or new API is being used.
          • PipeMapper and PipeReducer are also based on the old api. We will have
            to create new Mappers and Reducers based on the new API in order to
            support newer input and output formats. PipeMapRed also uses JobConf at
            a number of places. Almost all of these calls could be replaced by calls
            to Configuration object.
          • StreamInputFormat extends o.a.h.mapred.KeyValueTextInputFormat. It
            should extend o.a.h.mapreduce.lib.input.KeyValueTextInputFormat
          • StreamBaseRecordReader extends o.a.h.mapred.RecordReader. New class
            confirming to new API is needed.
          • Some static methods in StreamUtil.java are using old api -
            getCurrentSplit - uses o.a.h.mapred.FileSplit and Jobconf. This
            method is not used anywhere else in the code.
            isLocalJobTracker - uses JobConf.
            getTaskInfo - uses JobConf to get type of a task and taskid. used
            in PipeMapRed.setStreamJobDetails to set the taskid.
            addJobConfToEnvironment - takes a JobConf as argument. Should also
            take a Job.
            There is a static TaskID class in StreamUtils.java as well. If its not needed can it be removed?
          Show
          Jaideep added a comment - Some changes that are needed in order to support this. Everywhere in StreamJob, o.a.h.mapred.JobConf is used. To allow new input and output formats, new o.a.h.mapreduce.Job object should be used instead. Alternatively we can create and set configuration without relying on JobConf or Job methods, and only create a JobConf or Job object depending upon whether old or new API is being used. PipeMapper and PipeReducer are also based on the old api. We will have to create new Mappers and Reducers based on the new API in order to support newer input and output formats. PipeMapRed also uses JobConf at a number of places. Almost all of these calls could be replaced by calls to Configuration object. StreamInputFormat extends o.a.h.mapred.KeyValueTextInputFormat. It should extend o.a.h.mapreduce.lib.input.KeyValueTextInputFormat StreamBaseRecordReader extends o.a.h.mapred.RecordReader. New class confirming to new API is needed. Some static methods in StreamUtil.java are using old api - getCurrentSplit - uses o.a.h.mapred.FileSplit and Jobconf. This method is not used anywhere else in the code. isLocalJobTracker - uses JobConf. getTaskInfo - uses JobConf to get type of a task and taskid. used in PipeMapRed.setStreamJobDetails to set the taskid. addJobConfToEnvironment - takes a JobConf as argument. Should also take a Job. There is a static TaskID class in StreamUtils.java as well. If its not needed can it be removed?
          Hide
          Amareshwari Sriramadasu added a comment -

          Users can specify Mapper/Reducer to be Java Mapper/Reducer or a command. Also, he could specify input format, output format and partitioner for his streaming job. The below tables summarize the mapper or reducer in use when streaming supports both old and new api.

          Note : In the tables below, NS stands for 'Not specified".

          Table 1 Mapper-in-use for given spec, when num reducers = 0:

          Mapper InputFormat OutputFormat Valid conf? Mapper-in-use
          Command NS NS Yes New
          Command Old NS Yes Old
          Command Old Old Yes Old
          Command Old New No
          Command New NS Yes New
          Command New Old No
          Command New New Yes New
          Old NS NS Yes Old
          Old NS Old Yes Old
          Old Old NS Yes Old
          Old Old Old Yes Old
          Old - New No
          Old New - No
          New NS NS Yes New
          New NS New Yes New
          New New NS Yes New
          New New New Yes New
          New - Old No
          New Old - No

          Table 2 Mapper-in-use for given spec, when num reducers != 0:

          Mapper InputFormat Partitioner Valid conf? Mapper-in-use
          Command NS NS Yes New
          Command Old NS Yes Old
          Command Old Old Yes Old
          Command Old New No
          Command New NS Yes New
          Command New Old No
          Command New New Yes New
          Old NS NS Yes Old
          Old NS Old Yes Old
          Old Old NS Yes Old
          Old Old Old Yes Old
          Old New - No
          Old - New No
          New NS NS Yes New
          New NS New Yes New
          New New NS Yes New
          New New New Yes New
          New Old - No
          New - Old No

          Table 3 Reducer-in-use for a given spec :

          Reducer OutputFormat Valid conf? Reducer-in-use
          Command NS Yes New
          Command Old Yes Old
          Command New Yes New
          Old NS Yes Old
          New NS Yes New
          Old Old Yes Old
          New New Yes New
          Old New No
          New Old No
          Show
          Amareshwari Sriramadasu added a comment - Users can specify Mapper/Reducer to be Java Mapper/Reducer or a command. Also, he could specify input format, output format and partitioner for his streaming job. The below tables summarize the mapper or reducer in use when streaming supports both old and new api. Note : In the tables below, NS stands for 'Not specified". Table 1 Mapper-in-use for given spec, when num reducers = 0: Mapper InputFormat OutputFormat Valid conf? Mapper-in-use Command NS NS Yes New Command Old NS Yes Old Command Old Old Yes Old Command Old New No Command New NS Yes New Command New Old No Command New New Yes New Old NS NS Yes Old Old NS Old Yes Old Old Old NS Yes Old Old Old Old Yes Old Old - New No Old New - No New NS NS Yes New New NS New Yes New New New NS Yes New New New New Yes New New - Old No New Old - No Table 2 Mapper-in-use for given spec, when num reducers != 0: Mapper InputFormat Partitioner Valid conf? Mapper-in-use Command NS NS Yes New Command Old NS Yes Old Command Old Old Yes Old Command Old New No Command New NS Yes New Command New Old No Command New New Yes New Old NS NS Yes Old Old NS Old Yes Old Old Old NS Yes Old Old Old Old Yes Old Old New - No Old - New No New NS NS Yes New New NS New Yes New New New NS Yes New New New New Yes New New Old - No New - Old No Table 3 Reducer-in-use for a given spec : Reducer OutputFormat Valid conf? Reducer-in-use Command NS Yes New Command Old Yes Old Command New Yes New Old NS Yes Old New NS Yes New Old Old Yes Old New New Yes New Old New No New Old No
          Hide
          Amareshwari Sriramadasu added a comment -

          For supporting new api in streaming, the implementation involves two major tasks:

          1. Setting job configuration for the streaming job: set appropriate mapper and reducer depending on the arguments passed. Summarizing the above requirements table :
            • The old api mapper, PipeMapper, is used as mapper for the job only if mapper is command and
              a) old api input format is passed or
              b) #reduces=0 and old api output format is passed or
              c) #reduces !=0 and old api partitioner is passed.
            • Similarly the old api reducer, PipeReducer, is used as reducer for the job only if reducer is command and old output format is passed.
          2. Implementation of new api streaming mapper, reducer and etc.
          Show
          Amareshwari Sriramadasu added a comment - For supporting new api in streaming, the implementation involves two major tasks: Setting job configuration for the streaming job: set appropriate mapper and reducer depending on the arguments passed. Summarizing the above requirements table : The old api mapper, PipeMapper, is used as mapper for the job only if mapper is command and a) old api input format is passed or b) #reduces=0 and old api output format is passed or c) #reduces !=0 and old api partitioner is passed. Similarly the old api reducer, PipeReducer, is used as reducer for the job only if reducer is command and old output format is passed. Implementation of new api streaming mapper, reducer and etc.
          Hide
          Amareshwari Sriramadasu added a comment -

          Attaching a patch which does the following:

          • Deprectaes all the library classes in streaming such as AutoInputFormat, StreamInputFormat, StreamXmlRecordReader etc. and adds new classes which use new api.
          • Changes the tools DumpTypedBytes and LoadTypedBytes to use new api classes.
          • Adds StreamJobConfig holding all the configuration properties used in streaming.
          • Adds classes StreamingMapper, StreamingReducer and StreamingCombiner which extend new api Mapper and Reducer classes.
            • Adds a class StreamingProcess which starts streaming process, MR output/error threads and waits for the threads and etc. This functionality is in PipeMapred.java for the old api mapper/reducer; PipeMapper and PipeReducer extend PipeMapred and implement old Mapper/Reducer interfaces. We cannot make StreamingMapper/StreamingReducer extend StreamingProcess because in new api mapper and reducer are not interfaces. So moved this into a separate class so that StreamingMapper/StreamingReducer composes it.
            • InputWriter and OutputReader added in HADOOP-1722 take PipeMapred instance as a parameter for the constructor. But it does not make sense now because the process handling is served by separate class, StreamingProcess, for new api mapper/reducer. So, did a following Incompatible change (looks clean now):
              • Changes OutputReader constructor to take DataInput as parameter, instead of PipeMapRed
              • Changes InputWriter constructor to take DataOutput as parameter, instead of PipeMapRed
          • Moves some utility methods in PipeMapRed to StreamUtil.
          • Removes deprectaed StreamJob(String[] argv, boolean mayExit); Deprecates static public JobConf createJob(String[] argv); and adds static public Job createStreamingJob(String[] argv)
          • Refactors setJobConf() into multiple setters to set appropriate mapper/reducer in use.
          • Adds unit tests for all the usecases described above
          Show
          Amareshwari Sriramadasu added a comment - Attaching a patch which does the following: Deprectaes all the library classes in streaming such as AutoInputFormat, StreamInputFormat, StreamXmlRecordReader etc. and adds new classes which use new api. Changes the tools DumpTypedBytes and LoadTypedBytes to use new api classes. Adds StreamJobConfig holding all the configuration properties used in streaming. Adds classes StreamingMapper, StreamingReducer and StreamingCombiner which extend new api Mapper and Reducer classes. Adds a class StreamingProcess which starts streaming process, MR output/error threads and waits for the threads and etc. This functionality is in PipeMapred.java for the old api mapper/reducer; PipeMapper and PipeReducer extend PipeMapred and implement old Mapper/Reducer interfaces. We cannot make StreamingMapper/StreamingReducer extend StreamingProcess because in new api mapper and reducer are not interfaces. So moved this into a separate class so that StreamingMapper/StreamingReducer composes it. InputWriter and OutputReader added in HADOOP-1722 take PipeMapred instance as a parameter for the constructor. But it does not make sense now because the process handling is served by separate class, StreamingProcess, for new api mapper/reducer. So, did a following Incompatible change (looks clean now): Changes OutputReader constructor to take DataInput as parameter, instead of PipeMapRed Changes InputWriter constructor to take DataOutput as parameter, instead of PipeMapRed Moves some utility methods in PipeMapRed to StreamUtil. Removes deprectaed StreamJob(String[] argv, boolean mayExit); Deprecates static public JobConf createJob(String[] argv); and adds static public Job createStreamingJob(String[] argv) Refactors setJobConf() into multiple setters to set appropriate mapper/reducer in use. Adds unit tests for all the usecases described above
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch is ready for review.

          Show
          Amareshwari Sriramadasu added a comment - Patch is ready for review.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12448755/patch-1122.txt
          against trunk revision 960808.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 92 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/287/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/287/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/287/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/287/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448755/patch-1122.txt against trunk revision 960808. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 92 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/287/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/287/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/287/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/287/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          -1 contrib tests.

          The failure is because of MAPREDUCE-1834.

          Show
          Amareshwari Sriramadasu added a comment - -1 contrib tests. The failure is because of MAPREDUCE-1834 .
          Hide
          Amareshwari Sriramadasu added a comment -

          Forgot to mention that skipping bad records functionality is not added for new api classes, because the support is not there for new api in the framework itself(MAPREDUCE-1932).

          Show
          Amareshwari Sriramadasu added a comment - Forgot to mention that skipping bad records functionality is not added for new api classes, because the support is not there for new api in the framework itself( MAPREDUCE-1932 ).
          Hide
          Vinod Kumar Vavilapalli added a comment -

          I started looking at this patch. Its BIG, and streaming isn't exactly my 'home-ground' I had to spend quite some time reviewing it. Please bear with me, we will need to go through some iterations, at-least one more big one, to get this close.

          • First up, the patch needs some merging to be done to accommodate recent commits in streaming.
          • It'd be really good if we can separate the new classes into new packages, library classes into a lib package and implementation classes to an impl package?
          • There are two ways of handing the skipping of bad records in the new api - (1) put the code in place and document that it isn't supported yet so that whenever MAPREDUCE-1932 moves in skipping automatically works in streaming or (2) remove the code altogether and create a child issue of MAPREDUCE-1932 for streaming. Looks like you intended to do (2) but I do see some (dead) code related to skipping in the new api classes, for e.g. StreamingMapper. We should either chose (1) completely or (2) completely.

          Otherwise the overall functionality looks good to me, correctness including. Just some minor comments.

          StreamingMapper.java

          • This log statement is new and we are doing for every key. Too aggressive?
            LOG.info("input " + key + " "+ value);
          • Difference in logging compared to old PipeMapred class when exceptions happen in map.
          • Missing @Override annotation for methods overridden.

          StreamingReducer.java

          • Not logging exit code when exceptions happen in reduce. Used to be the case in old code.
          • Missing @Override annotation for methods overridden.

          How about passing configuration configuration to InputWriter.initialize() and let TextInputWriter/TextOutputReader maintain themselves the key/vaule separators and related information instead of polluting StreamingMapper and StreamingReducer?

          StreamingCombiner

          • Missing @Override annotation for method overridden.

          Autoinputformat2

          • No configure method like in AutoInputFormat?
          • Name? Once we move the lib classes to a new package, this class's name can stay the same old AutoInputFormat.

          StreamingXmlRecordReader.java

          • Log.info statement in init() bears the wrong (parent) class name.
          • nextKeyValue() should be synchronized? In old api it was.

          StreamingBaseRecordReader.java

          • getStatus() has changed w.r.t printing 'pos' also when compared to the older StreamBaseRecordReader.java

          StreamJob.java

          • Removes deprectaed StreamJob(String[] argv, boolean mayExit);

            Just checking. Is the compatibility left in one release?

          • Same for StreamJob.go()?
          • boolean isOldIF argument to setOutputFormat is not used at all.
          • Cluster and hence StreamJob never close client connection themselves at all! ( May be another ticket)

          TestStreamingStatus:

          • + //testStreamJob(false);// nonempty input
            Commented intentionally, in testReporing()?
          • Comments from +262 to +265 are no longer valid, right?

          TrApp.java

          • Some expect() and expectDefined() calls are dropped. I could understand why the ones related to output format are dropped to accommodate testing both new and old apis. But removing of the checks related to input file and file length didn't make sense to me.

          TaskInputOuputContextImpl.

          • The changes here were a surprise to me. Should be related to MAPREDUCE-1905. Are you incorporating that here, or just kept them in the patch for running. If it's the later, please provide a patch without these changes. If it is the former, we will need to include the testcase from there too.

          Miscellaneous comments:

          • It's right time for us to mark all the touched classes/interfaces according to the classification taxonomy.
          • Should we make the initialize methods in InputWriter and OutputReader abstract now?
          • TestStreamingAPICompatibility class needs some javadoc.
          • TODO: In the end we need to be sure tests pass with LinuxTaskController as well. Please do this with your next patch if you'ven't already.
          Show
          Vinod Kumar Vavilapalli added a comment - I started looking at this patch. Its BIG, and streaming isn't exactly my 'home-ground' I had to spend quite some time reviewing it. Please bear with me, we will need to go through some iterations, at-least one more big one, to get this close. First up, the patch needs some merging to be done to accommodate recent commits in streaming. It'd be really good if we can separate the new classes into new packages, library classes into a lib package and implementation classes to an impl package? There are two ways of handing the skipping of bad records in the new api - (1) put the code in place and document that it isn't supported yet so that whenever MAPREDUCE-1932 moves in skipping automatically works in streaming or (2) remove the code altogether and create a child issue of MAPREDUCE-1932 for streaming. Looks like you intended to do (2) but I do see some (dead) code related to skipping in the new api classes, for e.g. StreamingMapper. We should either chose (1) completely or (2) completely. Otherwise the overall functionality looks good to me, correctness including. Just some minor comments. StreamingMapper.java This log statement is new and we are doing for every key. Too aggressive? LOG.info("input " + key + " "+ value); Difference in logging compared to old PipeMapred class when exceptions happen in map. Missing @Override annotation for methods overridden. StreamingReducer.java Not logging exit code when exceptions happen in reduce. Used to be the case in old code. Missing @Override annotation for methods overridden. How about passing configuration configuration to InputWriter.initialize() and let TextInputWriter/TextOutputReader maintain themselves the key/vaule separators and related information instead of polluting StreamingMapper and StreamingReducer? StreamingCombiner Missing @Override annotation for method overridden. Autoinputformat2 No configure method like in AutoInputFormat? Name? Once we move the lib classes to a new package, this class's name can stay the same old AutoInputFormat. StreamingXmlRecordReader.java Log.info statement in init() bears the wrong (parent) class name. nextKeyValue() should be synchronized? In old api it was. StreamingBaseRecordReader.java getStatus() has changed w.r.t printing 'pos' also when compared to the older StreamBaseRecordReader.java StreamJob.java Removes deprectaed StreamJob(String[] argv, boolean mayExit); Just checking. Is the compatibility left in one release? Same for StreamJob.go()? boolean isOldIF argument to setOutputFormat is not used at all. Cluster and hence StreamJob never close client connection themselves at all! ( May be another ticket) TestStreamingStatus: + //testStreamJob(false);// nonempty input Commented intentionally, in testReporing()? Comments from +262 to +265 are no longer valid, right? TrApp.java Some expect() and expectDefined() calls are dropped. I could understand why the ones related to output format are dropped to accommodate testing both new and old apis. But removing of the checks related to input file and file length didn't make sense to me. TaskInputOuputContextImpl. The changes here were a surprise to me. Should be related to MAPREDUCE-1905 . Are you incorporating that here, or just kept them in the patch for running. If it's the later, please provide a patch without these changes. If it is the former, we will need to include the testcase from there too. Miscellaneous comments: It's right time for us to mark all the touched classes/interfaces according to the classification taxonomy. Should we make the initialize methods in InputWriter and OutputReader abstract now? TestStreamingAPICompatibility class needs some javadoc. TODO: In the end we need to be sure tests pass with LinuxTaskController as well. Please do this with your next patch if you'ven't already.
          Hide
          Jeremy Hanna added a comment -

          Is there any update on this? It's kind of a pain to have to support the old and new API in a custom InputFormat/RecordReader in order to enable streaming.

          Show
          Jeremy Hanna added a comment - Is there any update on this? It's kind of a pain to have to support the old and new API in a custom InputFormat/RecordReader in order to enable streaming.
          Hide
          Amareshwari Sriramadasu added a comment -

          Will upload a new patch soon.

          Show
          Amareshwari Sriramadasu added a comment - Will upload a new patch soon.
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch is updated to trunk with most of the review comments incorporated. Patch should be applied on top of MAPREDUCE-1905 to pass all tests.

          It'd be really good if we can separate the new classes into new packages, library classes into a lib package and implementation classes to an impl package?

          Done

          There are two ways of handing the skipping of bad records in the new api ...........

          Removed the dead code related to skipping in new api classes. Will add a subtask to MAPREDUCE-1932 to add support for streaming.

          StreamingReducer.java

          Not logging exit code when exceptions happen in reduce. Used to be the case in old code.

          Exit code is already logged in StreamingProcessManager. Even in old code, it was getting logged twice.

          How about passing configuration configuration to InputWriter.initialize() and let TextInputWriter/TextOutputReader maintain themselves the key/vaule separators and related information instead of polluting StreamingMapper and StreamingReducer?

          Did not do this. It makes the code more complicated because, mapper and reducers have different configuration parameter names.

          Autoinputformat2

          No configure method like in AutoInputFormat?

          New api does not have configure for inputformat.

          StreamJob.java

          Is the compatibility left in one release?

          Yes. all the removed deprecated methods have been deprectaed since release 0.19

          TrApp.java

          Some expect() and expectDefined() calls are dropped. I could understand why the ones related to output format are dropped to accommodate testing both new and old apis. But removing of the checks related to input file and file length didn't make sense to me.

          New api does not have the configuration parameters for input file and length (HADOOP-5973).

          Should we make the initialize methods in InputWriter and OutputReader abstract now?

          Did not do this. I don't think it is required.

          Patch incorporates all other commands

          Show
          Amareshwari Sriramadasu added a comment - Patch is updated to trunk with most of the review comments incorporated. Patch should be applied on top of MAPREDUCE-1905 to pass all tests. It'd be really good if we can separate the new classes into new packages, library classes into a lib package and implementation classes to an impl package? Done There are two ways of handing the skipping of bad records in the new api ........... Removed the dead code related to skipping in new api classes. Will add a subtask to MAPREDUCE-1932 to add support for streaming. StreamingReducer.java Not logging exit code when exceptions happen in reduce. Used to be the case in old code. Exit code is already logged in StreamingProcessManager. Even in old code, it was getting logged twice. How about passing configuration configuration to InputWriter.initialize() and let TextInputWriter/TextOutputReader maintain themselves the key/vaule separators and related information instead of polluting StreamingMapper and StreamingReducer? Did not do this. It makes the code more complicated because, mapper and reducers have different configuration parameter names. Autoinputformat2 No configure method like in AutoInputFormat? New api does not have configure for inputformat. StreamJob.java Is the compatibility left in one release? Yes. all the removed deprecated methods have been deprectaed since release 0.19 TrApp.java Some expect() and expectDefined() calls are dropped. I could understand why the ones related to output format are dropped to accommodate testing both new and old apis. But removing of the checks related to input file and file length didn't make sense to me. New api does not have the configuration parameters for input file and length ( HADOOP-5973 ). Should we make the initialize methods in InputWriter and OutputReader abstract now? Did not do this. I don't think it is required. Patch incorporates all other commands
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12458320/patch-1122-1.txt
          against trunk revision 1075216.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 100 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/93//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/93//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/93//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12458320/patch-1122-1.txt against trunk revision 1075216. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 100 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/93//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/93//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/93//console This message is automatically generated.
          Hide
          Arun C Murthy added a comment -

          Sorry to come in late, the patch has gone stale. Can you please rebase? Thanks.

          Show
          Arun C Murthy added a comment - Sorry to come in late, the patch has gone stale. Can you please rebase? Thanks.
          Hide
          Mathias Herberts added a comment -

          What is needed for this issue to move forward? Input and Output formats using the new API will become more and more frequent these days, people using streaming won't be able to benefit from those.

          Show
          Mathias Herberts added a comment - What is needed for this issue to move forward? Input and Output formats using the new API will become more and more frequent these days, people using streaming won't be able to benefit from those.

            People

            • Assignee:
              Amareshwari Sriramadasu
              Reporter:
              Keith Jackson
            • Votes:
              5 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:

                Development