Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6012

DBInputSplit creates invalid ranges on Oracle

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.2.1, 2.4.1
    • 1.3.0, 2.6.0
    • None
    • None
    • Reviewed

    Description

      The DBInputFormat on Oracle does not create valid ranges.

      The method getSplit line 263 is as follows:

      split = new DBInputSplit(i * chunkSize, (i * chunkSize) + chunkSize);

      So the first split will have a start value of 0 (0*chunkSize).

      However, the OracleDBRecordReader, line 84 is as follows:

      if (split.getLength() > 0 && split.getStart() > 0){

      Since the start value of the first range is equal to 0, we will skip the block that partitions the input set. As a result, one of the map task will process the entire data set, rather than the partition.

      I'm assuming the fix is trivial and would involve removing the second check in the if block.

      Also, I believe the OracleDBRecordReader paging query is incorrect.

      Line 92 should read:

      query.append(" ) WHERE dbif_rno > ").append(split.getStart());

      instead of (note > instead of >=)

      query.append(" ) WHERE dbif_rno >= ").append(split.getStart());

      Otherwise some rows will be ignored and some counted more than once.

      A map/reduce job that counts the number of rows based on a predicate will highlight the incorrect behavior.

      Attachments

        1. HADOOP-9530.patch
          1 kB
          Wei Yan
        2. MAPREDUCE-6012-2-branch2.patch
          3 kB
          Wei Yan
        3. MAPREDUCE-6012-branch-1.patch
          1 kB
          Wei Yan

        Issue Links

          Activity

            jserdaru Julien Serdaru added a comment -

            Looking into the backlog, it seems this issue looks into the problem highlighted in my issue, although the patch seems overcomplicated.

            Suppressing split.getStart() > 0 and changing
            WHERE dbif_rno to stricly greater rather than >= fixes all problems IMHO.

            jserdaru Julien Serdaru added a comment - Looking into the backlog, it seems this issue looks into the problem highlighted in my issue, although the patch seems overcomplicated. Suppressing split.getStart() > 0 and changing WHERE dbif_rno to stricly greater rather than >= fixes all problems IMHO.
            ywskycn Wei Yan added a comment -

            jserdaru, we also run this problem. As you mentioned, Oracle sql is different from MySQL which uses row offset. Uploaded a patch to fix this problem.

            ywskycn Wei Yan added a comment - jserdaru , we also run this problem. As you mentioned, Oracle sql is different from MySQL which uses row offset. Uploaded a patch to fix this problem.
            zxu Zhihai Xu added a comment -

            ywskycn 's patch looks good to me. His patch used getEnd() instead of getStart() + getLength(); in the SQL Query, which simplified the old code.

            zxu Zhihai Xu added a comment - ywskycn 's patch looks good to me. His patch used getEnd() instead of getStart() + getLength(); in the SQL Query, which simplified the old code.
            ywskycn Wei Yan added a comment -

            Also upload a patch for branch 1.

            ywskycn Wei Yan added a comment - Also upload a patch for branch 1.
            hadoopqa Hadoop QA added a comment -

            -1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12658539/HADOOP-9530.patch
            against trunk revision .

            +1 @author. The patch does not contain any @author tags.

            -1 tests included. The patch doesn't appear to include any new or modified tests.
            Please justify why no new tests are needed for this patch.
            Also please list what manual steps were performed to verify this patch.

            +1 javac. The applied patch does not increase the total number of javac compiler warnings.

            +1 javadoc. There were no new javadoc warning messages.

            +1 eclipse:eclipse. The patch built with eclipse:eclipse.

            +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

            +1 release audit. The applied patch does not increase the total number of release audit warnings.

            -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core:

            org.apache.hadoop.mapreduce.lib.db.TestDbClasses

            +1 contrib tests. The patch passed contrib unit tests.

            Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/4386//testReport/
            Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/4386//console

            This message is automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658539/HADOOP-9530.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: org.apache.hadoop.mapreduce.lib.db.TestDbClasses +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/4386//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/4386//console This message is automatically generated.
            hadoopqa Hadoop QA added a comment -

            -1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12658549/MAPREDUCE-6012-branch-1.patch
            against trunk revision .

            -1 patch. The patch command could not apply the patch.

            Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4776//console

            This message is automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658549/MAPREDUCE-6012-branch-1.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4776//console This message is automatically generated.
            ywskycn Wei Yan added a comment -

            A patch for branch-2 and fixed the test error.

            ywskycn Wei Yan added a comment - A patch for branch-2 and fixed the test error.
            hadoopqa Hadoop QA added a comment -

            +1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12658550/MAPREDUCE-6012-2-branch2.patch
            against trunk revision .

            +1 @author. The patch does not contain any @author tags.

            +1 tests included. The patch appears to include 1 new or modified test files.

            +1 javac. The applied patch does not increase the total number of javac compiler warnings.

            +1 javadoc. There were no new javadoc warning messages.

            +1 eclipse:eclipse. The patch built with eclipse:eclipse.

            +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

            +1 release audit. The applied patch does not increase the total number of release audit warnings.

            +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

            +1 contrib tests. The patch passed contrib unit tests.

            Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4777//testReport/
            Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4777//console

            This message is automatically generated.

            hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658550/MAPREDUCE-6012-2-branch2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4777//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4777//console This message is automatically generated.

            +1

            Spoke to Wei offline to understand the issue better, and his fix makes sense to me.

            kkambatl Karthik Kambatla (Inactive) added a comment - +1 Spoke to Wei offline to understand the issue better, and his fix makes sense to me.

            Thanks for the patch, Wei. Just committed this to trunk, branch-2 and branch-1.

            kkambatl Karthik Kambatla (Inactive) added a comment - Thanks for the patch, Wei. Just committed this to trunk, branch-2 and branch-1.
            rchiang Ray Chiang added a comment -

            Thanks Wei. Glad to see this fixed.

            rchiang Ray Chiang added a comment - Thanks Wei. Glad to see this fixed.
            hudson Hudson added a comment -

            FAILURE: Integrated in Hadoop-trunk-Commit #6086 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6086/)
            MAPREDUCE-6012. DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694)

            • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
            • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java
            • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java
            hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #6086 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6086/ ) MAPREDUCE-6012 . DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java
            hudson Hudson added a comment -

            FAILURE: Integrated in Hadoop-Yarn-trunk #651 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/651/)
            MAPREDUCE-6012. DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694)

            • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
            • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java
            • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java
            hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #651 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/651/ ) MAPREDUCE-6012 . DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java
            hudson Hudson added a comment -

            FAILURE: Integrated in Hadoop-Hdfs-trunk #1842 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1842/)
            MAPREDUCE-6012. DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694)

            • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
            • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java
            • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java
            hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1842 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1842/ ) MAPREDUCE-6012 . DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java
            hudson Hudson added a comment -

            FAILURE: Integrated in Hadoop-Mapreduce-trunk #1868 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1868/)
            MAPREDUCE-6012. DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694)

            • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
            • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java
            • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java
            hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1868 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1868/ ) MAPREDUCE-6012 . DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java

            People

              ywskycn Wei Yan
              jserdaru Julien Serdaru
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: