Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Ports the mapred.join library to the new Map/Reduce API

      Description

      To change org.apache.hadoop.examples.Join to use new api, we need to change org.apache.hadoop.mapred.join to use new api. So,
      Deprecate the code in org.apache.hadoop.mapred.join.
      Copy the code to org.apache.hadoop.mapreduce.lib.join and Change it to use new api.
      Thoughts ?

      1. patch-355-4.txt
        176 kB
        Amareshwari Sriramadasu
      2. patch-355-3.txt
        176 kB
        Amareshwari Sriramadasu
      3. patch-355-2.txt
        176 kB
        Amareshwari Sriramadasu
      4. patch-355-1.txt
        159 kB
        Amareshwari Sriramadasu
      5. patch-355.txt
        158 kB
        Amareshwari Sriramadasu

        Issue Links

          Activity

          Hide
          Chris Douglas added a comment -

          +1 This looks good.

          I committed this. Thanks, Amareshwari!

          Show
          Chris Douglas added a comment - +1 This looks good. I committed this. Thanks, Amareshwari!
          Hide
          Amareshwari Sriramadasu added a comment -

          -1 core tests : due to MAPREDUCE-880
          -1 contrib tests : due to MAPREDUCE-699

          Show
          Amareshwari Sriramadasu added a comment - -1 core tests : due to MAPREDUCE-880 -1 contrib tests : due to MAPREDUCE-699
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12417255/patch-355-4.txt
          against trunk revision 806508.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 18 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/501/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/501/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/501/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/501/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12417255/patch-355-4.txt against trunk revision 806508. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/501/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/501/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/501/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/501/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          Since HADOOP-6103 is committed, Patch removes changes in JobContext.java.
          Also changed configuration names in new join package to be mapreduce.join.*, in favor of MAPREDUCE-849.

          Show
          Amareshwari Sriramadasu added a comment - Since HADOOP-6103 is committed, Patch removes changes in JobContext.java. Also changed configuration names in new join package to be mapreduce.join.*, in favor of MAPREDUCE-849 .
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12416163/patch-355-3.txt
          against trunk revision 803193.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 18 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/467/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/467/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/467/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/467/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416163/patch-355-3.txt against trunk revision 803193. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/467/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/467/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/467/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/467/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch updated with trunk

          Show
          Amareshwari Sriramadasu added a comment - Patch updated with trunk
          Hide
          Amareshwari Sriramadasu added a comment -

          patch doesnt apply to trunk

          Show
          Amareshwari Sriramadasu added a comment - patch doesnt apply to trunk
          Hide
          Amareshwari Sriramadasu added a comment -

          -1 release audit. Is Spurious. releaseAuditDiffWarnings.txt shows the diff in jdiff files.
          -1 core tests. Is not related to the patch. Raised MAPREDUCE-756 for the same.
          -1 contrib tests. Streaming test failures on trunk is known issue.

          Show
          Amareshwari Sriramadasu added a comment - -1 release audit. Is Spurious. releaseAuditDiffWarnings.txt shows the diff in jdiff files. -1 core tests. Is not related to the patch. Raised MAPREDUCE-756 for the same. -1 contrib tests. Streaming test failures on trunk is known issue.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12413098/patch-355-2.txt
          against trunk revision 793457.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 18 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 release audit. The applied patch generated 333 release audit warnings (more than the trunk's current 315 warnings).

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/379/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/379/artifact/trunk/current/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/379/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/379/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/379/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12413098/patch-355-2.txt against trunk revision 793457. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 333 release audit warnings (more than the trunk's current 315 warnings). -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/379/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/379/artifact/trunk/current/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/379/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/379/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/379/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          If the key is modified in the map, the outcome of the join is affected. It is necessary to clone the key out of the join collector. A test for this should probably be added to both TestJoinDatamerge and TestDatamerge. That the value in RRs in the tree is also not restored may also be an issue.

          I was not able to write a test case for this. I tried verifying by doing key.set(-1) in the mapper. I was seeing problem even if i clone.

          Thanks for the unit test Chris. I updated patch with unit test and fixed the bug.

          The checks verifying type consistency for keys in general and for values in MultiFilterRecordReader have been removed. Are these not necessary?

          The checks verifying type consistency for values in MultiFilterRecordReader are not necessary. Because we can have a join such as override(inner(A,B),A). Removed the consistency checks for values in MultiFilterRecordReader.

          Also added a test TestJoinProperties(suggested by Chris) which tests
          1. Outer join associativity : outer(outer(A, B), C) == outer(A, outer(B, C)) == outer(A, B, C)
          2. Inner join associativity : inner(inner(A, B), C) == inner(A, inner(B,C)) == inner(A, B, C)
          3. Override identity, inner consistency : override(inner(A,B),A) = A
          Also these tests use different value types in the sources.

          Show
          Amareshwari Sriramadasu added a comment - If the key is modified in the map, the outcome of the join is affected. It is necessary to clone the key out of the join collector. A test for this should probably be added to both TestJoinDatamerge and TestDatamerge. That the value in RRs in the tree is also not restored may also be an issue. I was not able to write a test case for this. I tried verifying by doing key.set(-1) in the mapper. I was seeing problem even if i clone. Thanks for the unit test Chris. I updated patch with unit test and fixed the bug. The checks verifying type consistency for keys in general and for values in MultiFilterRecordReader have been removed. Are these not necessary? The checks verifying type consistency for values in MultiFilterRecordReader are not necessary. Because we can have a join such as override(inner(A,B),A). Removed the consistency checks for values in MultiFilterRecordReader. Also added a test TestJoinProperties(suggested by Chris) which tests 1. Outer join associativity : outer(outer(A, B), C) == outer(A, outer(B, C)) == outer(A, B, C) 2. Inner join associativity : inner(inner(A, B), C) == inner(A, inner(B,C)) == inner(A, B, C) 3. Override identity, inner consistency : override(inner(A,B),A) = A Also these tests use different value types in the sources.
          Hide
          Amareshwari Sriramadasu added a comment -

          If the key is modified in the map, the outcome of the join is affected. It is necessary to clone the key out of the join collector. A test for this should probably be added to both TestJoinDatamerge and TestDatamerge. That the value in RRs in the tree is also not restored may also be an issue.

          I was not able to write a test case for this. I tried verifying by doing key.set(-1) in the mapper. I was seeing problem even if i clone.

          q should never be null and value is known non-null given the check added just above it. Why is the internal tuple created lazily?

          q could null if non of the kids have any input. (For ex. empty join). Removed non-null check for value and made sure createValue always creates an object.
          The internal tuple is created in nextKeyValue(), earlier the creation was happening at MapTask.

          The checks verifying type consistency for keys in general and for values in MultiFilterRecordReader have been removed. Are these not necessary?

          Added consistency checks for keys and values, if there is any input.

          I'm still working on adding more unit tests if possible.

          Show
          Amareshwari Sriramadasu added a comment - If the key is modified in the map, the outcome of the join is affected. It is necessary to clone the key out of the join collector. A test for this should probably be added to both TestJoinDatamerge and TestDatamerge. That the value in RRs in the tree is also not restored may also be an issue. I was not able to write a test case for this. I tried verifying by doing key.set(-1) in the mapper. I was seeing problem even if i clone. q should never be null and value is known non-null given the check added just above it. Why is the internal tuple created lazily? q could null if non of the kids have any input. (For ex. empty join). Removed non-null check for value and made sure createValue always creates an object. The internal tuple is created in nextKeyValue(), earlier the creation was happening at MapTask. The checks verifying type consistency for keys in general and for values in MultiFilterRecordReader have been removed. Are these not necessary? Added consistency checks for keys and values, if there is any input. I'm still working on adding more unit tests if possible.
          Hide
          Chris Douglas added a comment -
          • The patch does not apply cleanly (e.g. src/java/src/java/...)
          • If the key is modified in the map, the outcome of the join is affected. It is necessary to clone the key out of the join collector. A test for this should probably be added to both TestJoinDatamerge and TestDatamerge. That the value in RRs in the tree is also not restored may also be an issue.
          • Failing tests in TestJoinDatamerge will not cause the test to fail. The return value of job.waitForCompletion must be checked.
          • The changes to TupleWritable should be reverted. The methods should be private and the formatting changes are unnecessary.
          • This change looks wrong:
            +    if (value == null) {
            +      value = createValue();
            +    }
            +    final PriorityQueue<ComposableRecordReader<K,?>> q = 
            +            getRecordReaderQueue();
                 K iterkey = createKey();
            -    final PriorityQueue<ComposableRecordReader<K,?>> q = getRecordReaderQueue();
            -    while (!q.isEmpty()) {
            +    while (q != null && !q.isEmpty() && value != null) {
            

            q should never be null and value is known non-null given the check added just above it. Why is the internal tuple created lazily?

          • Now that Hadoop is on Java6
            +        throw (IOException)new IOException().initCause(e);
            

            These should be replaced with the cstr taking the cause.

          • This should update the javadoc for the Join example; it was copied from the sort example and never updated.
          • ArrayListBackedIterator is mostly just an example for testing, but creating a new Configuration for each call to next is not good.
          • The checks verifying type consistency for keys in general and for values in MultiFilterRecordReader have been removed. Are these not necessary?
          • There is commented code at CompositeRecordReader:422
          • CompositeRecordReader::createValue assumes that the value type of all the child RRs is the same, but this is not valid.
          Show
          Chris Douglas added a comment - The patch does not apply cleanly (e.g. src/java/src/java/...) If the key is modified in the map, the outcome of the join is affected. It is necessary to clone the key out of the join collector. A test for this should probably be added to both TestJoinDatamerge and TestDatamerge. That the value in RRs in the tree is also not restored may also be an issue. Failing tests in TestJoinDatamerge will not cause the test to fail. The return value of job.waitForCompletion must be checked. The changes to TupleWritable should be reverted. The methods should be private and the formatting changes are unnecessary. This change looks wrong: + if (value == null) { + value = createValue(); + } + final PriorityQueue<ComposableRecordReader<K,?>> q = + getRecordReaderQueue(); K iterkey = createKey(); - final PriorityQueue<ComposableRecordReader<K,?>> q = getRecordReaderQueue(); - while (!q.isEmpty()) { + while (q != null && !q.isEmpty() && value != null) { q should never be null and value is known non-null given the check added just above it. Why is the internal tuple created lazily? Now that Hadoop is on Java6 + throw (IOException)new IOException().initCause(e); These should be replaced with the cstr taking the cause. This should update the javadoc for the Join example; it was copied from the sort example and never updated. ArrayListBackedIterator is mostly just an example for testing, but creating a new Configuration for each call to next is not good. The checks verifying type consistency for keys in general and for values in MultiFilterRecordReader have been removed. Are these not necessary? There is commented code at CompositeRecordReader:422 CompositeRecordReader::createValue assumes that the value type of all the child RRs is the same, but this is not valid.
          Hide
          Amareshwari Sriramadasu added a comment -

          Attaching patch for review.
          Patch does the following:
          1. Changes org.apache.hadoop.examples.Join to use new api.
          2. Deprecates org.apache.hadoop.mapred.Join.* classes. All the classes are copied to org.apache.hadoop.mapreduce.lib.Join.* and modified to use new api. I did not remove the old code in org.apache.hadoop.mapred.Join.*RerordReader.java and org.apache.hadoop.mapred.Join.InputFormat.java since old code uses old RecordReaders and InputFormats. There is not much common code there. I'll still go through and re-use code if i can.
          3.Copied tests from org.apache.hadoop.mapred.Join.* to org.apache.hadoop.mapreduce.lib.Join.* and modified them to test new api.

          Show
          Amareshwari Sriramadasu added a comment - Attaching patch for review. Patch does the following: 1. Changes org.apache.hadoop.examples.Join to use new api. 2. Deprecates org.apache.hadoop.mapred.Join.* classes. All the classes are copied to org.apache.hadoop.mapreduce.lib.Join.* and modified to use new api. I did not remove the old code in org.apache.hadoop.mapred.Join.*RerordReader.java and org.apache.hadoop.mapred.Join.InputFormat.java since old code uses old RecordReaders and InputFormats. There is not much common code there. I'll still go through and re-use code if i can. 3.Copied tests from org.apache.hadoop.mapred.Join.* to org.apache.hadoop.mapreduce.lib.Join.* and modified them to test new api.

            People

            • Assignee:
              Amareshwari Sriramadasu
              Reporter:
              Amareshwari Sriramadasu
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development