Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3607

Port missing new API mapreduce lib classes to 1.x

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.0.1
    • Component/s: client
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Target Version/s:

      Description

      There are a number of classes under mapreduce.lib that are not present in the 1.x series. Including these would help users and downstream projects using the new MapReduce API migrate to later versions of Hadoop in the future.

      A few examples of where this would help:

      • Sqoop uses mapreduce.lib.db.DBWritable and mapreduce.lib.input.CombineFileInputFormat (SQOOP-384).
      • Mahout uses mapreduce.lib.output.MultipleOutputs (MAHOUT-822).
      • HBase has a backport of mapreduce.lib.partition.InputSampler and TotalOrderPartitioner (in org.apache.hadoop.hbase.mapreduce.hadoopbackport) - it would be better if it used the ones in Hadoop.
      1. MAPREDUCE-3607.patch
        551 kB
        Tom White
      2. MAPREDUCE-3607.patch
        550 kB
        Tom White
      3. MAPREDUCE-3607.patch
        315 kB
        Tom White

        Issue Links

          Activity

          Hide
          Kihwal Lee added a comment -

          Tom: Thanks for the clarification. MAPREDUCE-4207 has been filed.

          Show
          Kihwal Lee added a comment - Tom: Thanks for the clarification. MAPREDUCE-4207 has been filed.
          Hide
          Tom White added a comment -

          Kihwal - adding this line was clearly a mistake and so it should be removed. Please go ahead and file a JIRA.

          Show
          Tom White added a comment - Kihwal - adding this line was clearly a mistake and so it should be removed. Please go ahead and file a JIRA.
          Hide
          Kihwal Lee added a comment -

          Sorry for generating traffic on the closed jira.

          I just want to find out the reason why the follwing was added. I hear some people complaining about this. If there is a good reason to keep it, it can probably convince them as well. Otherwise, I will file a jira to remove the line.

          FileInputFormat.java.diff
          -- hadoop/common/branches/branch-1.0/src/mapred/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java	2011/11/27 21:31:26	1206848
          +++ hadoop/common/branches/branch-1.0/src/mapred/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java	2012/01/24 23:30:12	1235551
          @@ -422,6 +422,7 @@
              */
             public static Path[] getInputPaths(JobContext context) {
               String dirs = context.getConfiguration().get("mapred.input.dir", "");
          +    System.out.println("****" + dirs);
               String [] list = StringUtils.split(dirs);
               Path[] result = new Path[list.length];
               for (int i = 0; i < list.length; i++) {
          
          Show
          Kihwal Lee added a comment - Sorry for generating traffic on the closed jira. I just want to find out the reason why the follwing was added. I hear some people complaining about this. If there is a good reason to keep it, it can probably convince them as well. Otherwise, I will file a jira to remove the line. FileInputFormat.java.diff -- hadoop/common/branches/branch-1.0/src/mapred/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java 2011/11/27 21:31:26 1206848 +++ hadoop/common/branches/branch-1.0/src/mapred/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java 2012/01/24 23:30:12 1235551 @@ -422,6 +422,7 @@ */ public static Path[] getInputPaths(JobContext context) { String dirs = context.getConfiguration().get( "mapred.input.dir" , ""); + System .out.println( "****" + dirs); String [] list = StringUtils.split(dirs); Path[] result = new Path[list.length]; for ( int i = 0; i < list.length; i++) {
          Hide
          Matt Foley added a comment -

          Closed upon release 1.0.1.

          Show
          Matt Foley added a comment - Closed upon release 1.0.1.
          Hide
          Tom White added a comment -

          Thanks for the review Mahadev. I've committed this to branch-1 and branch-1.0.

          Show
          Tom White added a comment - Thanks for the review Mahadev. I've committed this to branch-1 and branch-1.0.
          Hide
          Matt Foley added a comment -

          Please commit to both branch-1 and branch-1.0. Thank you.

          Show
          Matt Foley added a comment - Please commit to both branch-1 and branch-1.0. Thank you.
          Hide
          Mahadev konar added a comment -

          +1 the changes look good to me.

          Show
          Mahadev konar added a comment - +1 the changes look good to me.
          Hide
          Tom White added a comment -

          I updated the patch with Configuration.getInstances() added. I also tested Sqoop with the copy of Hadoop build using this patch and all of its unit tests passed (see SQOOP-384).

          Show
          Tom White added a comment - I updated the patch with Configuration.getInstances() added. I also tested Sqoop with the copy of Hadoop build using this patch and all of its unit tests passed (see SQOOP-384 ).
          Hide
          Matt Foley added a comment - - edited

          Tom, SQOOP-384 lists four mapreduce APIs needed by sqoop, and you've included all four of them in this patch. However, they also need a different signature of org.apache.hadoop.conf.Configuration.getInstances, as discussed in SQOOP-384 comment 13166568 and shown in the patch attached to that jira.

          Can you add that API to this patch, please?

          Show
          Matt Foley added a comment - - edited Tom, SQOOP-384 lists four mapreduce APIs needed by sqoop, and you've included all four of them in this patch. However, they also need a different signature of org.apache.hadoop.conf.Configuration.getInstances, as discussed in SQOOP-384 comment 13166568 and shown in the patch attached to that jira. Can you add that API to this patch, please?
          Hide
          Tom White added a comment -

          Here's a new patch which adds FieldSelectionMapper/Reducer, NLineInputFormat, SequenceFile input/output formats, JobControl, and partition classes, along with tests for all of the classes.

          The results of test-patch:

               [exec] -1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     +1 tests included.  The patch appears to include 100 new or modified tests.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     -1 findbugs.  The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings.
          

          Note the findbugs warnings are present in trunk too, since this is a backport. Tests pass.

          I would like this to be considered for inclusion in 1.1.0.

          Show
          Tom White added a comment - Here's a new patch which adds FieldSelectionMapper/Reducer, NLineInputFormat, SequenceFile input/output formats, JobControl, and partition classes, along with tests for all of the classes. The results of test-patch: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 100 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. Note the findbugs warnings are present in trunk too, since this is a backport. Tests pass. I would like this to be considered for inclusion in 1.1.0.
          Hide
          Tom White added a comment -

          Here's an initial patch which adds support (and tests) for the DB classes, CombineFileInputFormat, KeyValueInputFormat, MultipleInputs, MultipleOutputs, and BinaryPartitioner.

          This is a work in progress - I intend to add more classes.

          Show
          Tom White added a comment - Here's an initial patch which adds support (and tests) for the DB classes, CombineFileInputFormat, KeyValueInputFormat, MultipleInputs, MultipleOutputs, and BinaryPartitioner. This is a work in progress - I intend to add more classes.

            People

            • Assignee:
              Tom White
              Reporter:
              Tom White
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development