[MAPREDUCE-6012] DBInputSplit creates invalid ranges on Oracle - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.2.1, 2.4.1
Fix Version/s: 1.3.0, 2.6.0
Component/s: None
Labels:
None

Target Version/s:

2.6.0
Hadoop Flags:

Reviewed

Description

The DBInputFormat on Oracle does not create valid ranges.

The method getSplit line 263 is as follows:

split = new DBInputSplit(i * chunkSize, (i * chunkSize) + chunkSize);

So the first split will have a start value of 0 (0*chunkSize).

However, the OracleDBRecordReader, line 84 is as follows:

if (split.getLength() > 0 && split.getStart() > 0){

Since the start value of the first range is equal to 0, we will skip the block that partitions the input set. As a result, one of the map task will process the entire data set, rather than the partition.

I'm assuming the fix is trivial and would involve removing the second check in the if block.

Also, I believe the OracleDBRecordReader paging query is incorrect.

Line 92 should read:

query.append(" ) WHERE dbif_rno > ").append(split.getStart());

instead of (note > instead of >=)

query.append(" ) WHERE dbif_rno >= ").append(split.getStart());

Otherwise some rows will be ignored and some counted more than once.

A map/reduce job that counts the number of rows based on a predicate will highlight the incorrect behavior.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-9530.patch
30/Jul/14 00:05
1 kB
Wei Yan
MAPREDUCE-6012-2-branch2.patch
30/Jul/14 00:50
3 kB
Wei Yan
MAPREDUCE-6012-branch-1.patch
30/Jul/14 00:36
1 kB
Wei Yan

Issue Links

is related to

HADOOP-8331 Created patch that adds oracle support to DBInputFormat and solves a splitting duplication problem introduced with my last patch.

Open

Activity

Ascending order - Click to sort in descending order

Julien Serdaru added a comment - 01/May/13 02:57

Looking into the backlog, it seems this issue looks into the problem highlighted in my issue, although the patch seems overcomplicated.

Suppressing split.getStart() > 0 and changing
WHERE dbif_rno to stricly greater rather than >= fixes all problems IMHO.

Julien Serdaru added a comment - 01/May/13 02:57 Looking into the backlog, it seems this issue looks into the problem highlighted in my issue, although the patch seems overcomplicated. Suppressing split.getStart() > 0 and changing WHERE dbif_rno to stricly greater rather than >= fixes all problems IMHO.

Wei Yan added a comment - 30/Jul/14 00:05

jserdaru, we also run this problem. As you mentioned, Oracle sql is different from MySQL which uses row offset. Uploaded a patch to fix this problem.

Wei Yan added a comment - 30/Jul/14 00:05 jserdaru , we also run this problem. As you mentioned, Oracle sql is different from MySQL which uses row offset. Uploaded a patch to fix this problem.

Zhihai Xu added a comment - 30/Jul/14 00:36

ywskycn 's patch looks good to me. His patch used getEnd() instead of getStart() + getLength(); in the SQL Query, which simplified the old code.

Zhihai Xu added a comment - 30/Jul/14 00:36 ywskycn 's patch looks good to me. His patch used getEnd() instead of getStart() + getLength(); in the SQL Query, which simplified the old code.

Wei Yan added a comment - 30/Jul/14 00:36

Also upload a patch for branch 1.

Wei Yan added a comment - 30/Jul/14 00:36 Also upload a patch for branch 1.

Hadoop QA added a comment - 30/Jul/14 00:39

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12658539/HADOOP-9530.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 javadoc. There were no new javadoc warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

-1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core:

org.apache.hadoop.mapreduce.lib.db.TestDbClasses

+1 contrib tests. The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/4386//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/4386//console

This message is automatically generated.

Hadoop QA added a comment - 30/Jul/14 00:39 -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658539/HADOOP-9530.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: org.apache.hadoop.mapreduce.lib.db.TestDbClasses +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/4386//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/4386//console This message is automatically generated.

Hadoop QA added a comment - 30/Jul/14 00:43

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12658549/MAPREDUCE-6012-branch-1.patch
against trunk revision .

-1 patch. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4776//console

This message is automatically generated.

Hadoop QA added a comment - 30/Jul/14 00:43 -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658549/MAPREDUCE-6012-branch-1.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4776//console This message is automatically generated.

Wei Yan added a comment - 30/Jul/14 00:50

A patch for branch-2 and fixed the test error.

Wei Yan added a comment - 30/Jul/14 00:50 A patch for branch-2 and fixed the test error.

Hadoop QA added a comment - 30/Jul/14 01:34

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12658550/MAPREDUCE-6012-2-branch2.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 1 new or modified test files.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 javadoc. There were no new javadoc warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

+1 contrib tests. The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4777//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4777//console

This message is automatically generated.

Hadoop QA added a comment - 30/Jul/14 01:34 +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658550/MAPREDUCE-6012-2-branch2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4777//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4777//console This message is automatically generated.

Karthik Kambatla (Inactive) added a comment - 10/Aug/14 21:11

Spoke to Wei offline to understand the issue better, and his fix makes sense to me.

Karthik Kambatla (Inactive) added a comment - 10/Aug/14 21:11 +1 Spoke to Wei offline to understand the issue better, and his fix makes sense to me.

Karthik Kambatla (Inactive) added a comment - 18/Aug/14 18:45

Thanks for the patch, Wei. Just committed this to trunk, branch-2 and branch-1.

Karthik Kambatla (Inactive) added a comment - 18/Aug/14 18:45 Thanks for the patch, Wei. Just committed this to trunk, branch-2 and branch-1.

Ray Chiang added a comment - 18/Aug/14 19:59

Thanks Wei. Glad to see this fixed.

Ray Chiang added a comment - 18/Aug/14 19:59 Thanks Wei. Glad to see this fixed.

Hudson added a comment - 19/Aug/14 00:16

FAILURE: Integrated in Hadoop-trunk-Commit #6086 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6086/)
~~MAPREDUCE-6012~~. DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694)

/hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java

Hudson added a comment - 19/Aug/14 00:16 FAILURE: Integrated in Hadoop-trunk-Commit #6086 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6086/ ) MAPREDUCE-6012 . DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java

Hudson added a comment - 19/Aug/14 16:20

FAILURE: Integrated in Hadoop-Yarn-trunk #651 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/651/)
~~MAPREDUCE-6012~~. DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694)

/hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java

Hudson added a comment - 19/Aug/14 16:20 FAILURE: Integrated in Hadoop-Yarn-trunk #651 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/651/ ) MAPREDUCE-6012 . DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java

Hudson added a comment - 19/Aug/14 18:56

FAILURE: Integrated in Hadoop-Hdfs-trunk #1842 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1842/)
~~MAPREDUCE-6012~~. DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694)

/hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java

Hudson added a comment - 19/Aug/14 18:56 FAILURE: Integrated in Hadoop-Hdfs-trunk #1842 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1842/ ) MAPREDUCE-6012 . DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java

Hudson added a comment - 20/Aug/14 00:01

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1868 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1868/)
~~MAPREDUCE-6012~~. DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694)

/hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java

Hudson added a comment - 20/Aug/14 00:01 FAILURE: Integrated in Hadoop-Mapreduce-trunk #1868 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1868/ ) MAPREDUCE-6012 . DBInputSplit creates invalid ranges on Oracle. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618694 ) /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/db/OracleDBRecordReader.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/db/TestDbClasses.java

People

Assignee:: Wei Yan

Reporter:: Julien Serdaru

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 01/May/13 01:23

Updated:: 01/Dec/14 03:09

Resolved:: 18/Aug/14 18:45

Hadoop Map/Reduce

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates