Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1489

DataDrivenDBInputFormat should not query the database when generating only one split

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      DataDrivenDBInputFormat runs a query to establish bounding values for each split it generates; but if it's going to generate only one split (mapreduce.job.maps == 1), then there's no reason to do this. This will remove overhead associated with a single-threaded import of a non-indexed table since it avoids a full table scan.

        Issue Links

          Activity

          Hide
          Aaron Kimball added a comment -

          Attaching patch which improves DataDrivenDBInputFormat in this manner. Also updated use of deprecated config keys nearby.

          The majority of tests for DDDBIF are implicit in the Sqoop tests; I've modified these so that some Sqoop tests respect the old code-path by requesting multiple map tasks, but other Sqoop tests will use a single task.

          This patch is based on the code in MAPREDUCE-1460; will mark as patch-available when that's committed.

          Show
          Aaron Kimball added a comment - Attaching patch which improves DataDrivenDBInputFormat in this manner. Also updated use of deprecated config keys nearby. The majority of tests for DDDBIF are implicit in the Sqoop tests; I've modified these so that some Sqoop tests respect the old code-path by requesting multiple map tasks, but other Sqoop tests will use a single task. This patch is based on the code in MAPREDUCE-1460 ; will mark as patch-available when that's committed.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12435756/MAPREDUCE-1489.patch
          against trunk revision 926449.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/49/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/49/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/49/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/49/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435756/MAPREDUCE-1489.patch against trunk revision 926449. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/49/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/49/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/49/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/49/console This message is automatically generated.
          Hide
          Aaron Kimball added a comment -

          Hudson failed to test; recycling.

          Show
          Aaron Kimball added a comment - Hudson failed to test; recycling.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12435756/MAPREDUCE-1489.patch
          against trunk revision 926449.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/52/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/52/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/52/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/52/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435756/MAPREDUCE-1489.patch against trunk revision 926449. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/52/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/52/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/52/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/52/console This message is automatically generated.
          Hide
          Aaron Kimball added a comment -

          Test failure in HAR is unrelated

          Show
          Aaron Kimball added a comment - Test failure in HAR is unrelated
          Hide
          Tom White added a comment -

          +1 I've just committed this. Thanks Aaron!

          Show
          Tom White added a comment - +1 I've just committed this. Thanks Aaron!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #271 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/271/)
          . DataDrivenDBInputFormat should not query the database when generating only one split. Contributed by Aaron Kimball.

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #271 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/271/ ) . DataDrivenDBInputFormat should not query the database when generating only one split. Contributed by Aaron Kimball.

            People

            • Assignee:
              Aaron Kimball
              Reporter:
              Aaron Kimball
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development