Hadoop Common
  1. Hadoop Common
  2. HADOOP-5844

Use mysqldump when connecting to local mysql instance in Sqoop

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Sqoop uses MapReduce + DBInputFormat to read the contents of a table into HDFS. On many databases, this implementation is O(N^2) in the number of rows. Also, the use of multiple mappers has low value in terms of throughput, because the database itself is inherently singlethreaded. While DBInputFormat/JDBC provides a useful fallback mechanism for importing from databases, db-specific dump utilities will nearly always provide faster throughput, and should be selected when available. This patch allows users to use mysqldump to read from local mysql instances instead of the MapReduce-based input.

      If you provide sqoop with arguments of the form " --connect jdbc:mysql://localhost/somedatabase --local", it will use the mysqldump fast path to perform the import.

      This patch, naturally, requires that MySQL be installed on a machine to test it. Thus the test that this adds is called LocalMySQLTest (instead of the Hadoop-preferred file naming, TestLocalMySQL) so that Hudson doesn't automatically run it. You can run this test yourself by using "ant -Dtestcase=LocalMySQLTest test". See the notes in the javadoc for the LocalMySQLTest class in how to set up the MySQL test environment for this.

      1. mysqldump.patch
        25 kB
        Aaron Kimball

        Issue Links

          Activity

          Gavin made changes -
          Link This issue depends upon HADOOP-5815 [ HADOOP-5815 ]
          Gavin made changes -
          Link This issue depends on HADOOP-5815 [ HADOOP-5815 ]
          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #863 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/863/ )
          Tom White made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Fix Version/s 0.21.0 [ 12313563 ]
          Resolution Fixed [ 1 ]
          Hide
          Tom White added a comment -

          I've just committed this. Thanks Aaron!

          Show
          Tom White added a comment - I've just committed this. Thanks Aaron!
          Hide
          Tom White added a comment -

          +1 Looks good.

          Show
          Tom White added a comment - +1 Looks good.
          Hide
          Aaron Kimball added a comment -

          The contrib test failure is unrelated.

          Show
          Aaron Kimball added a comment - The contrib test failure is unrelated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12408205/mysqldump.patch
          against trunk revision 779656.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/419/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/419/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/419/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/419/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12408205/mysqldump.patch against trunk revision 779656. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/419/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/419/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/419/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/419/console This message is automatically generated.
          Hide
          Aaron Kimball added a comment -

          As a side note, this patch fixes a bug in HADOOP-5815 wherein sqoop did not add sqoop.jar itself to the classpath to pass to javac. As a result, compilation of generated code only worked in unit test mode (which made direct references to the .class files in the build directory), or when sqoop.jar was present in $HADOOP_HOME/lib/ (the contents of which were passed to javac).

          With this patch, generated code compilation works regardless of the location of sqoop.jar.

          Show
          Aaron Kimball added a comment - As a side note, this patch fixes a bug in HADOOP-5815 wherein sqoop did not add sqoop.jar itself to the classpath to pass to javac. As a result, compilation of generated code only worked in unit test mode (which made direct references to the .class files in the build directory), or when sqoop.jar was present in $HADOOP_HOME/lib/ (the contents of which were passed to javac). With this patch, generated code compilation works regardless of the location of sqoop.jar.
          Aaron Kimball made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Aaron Kimball added a comment -

          Cycling the patch status now that 5815 is in to actually test this

          Show
          Aaron Kimball added a comment - Cycling the patch status now that 5815 is in to actually test this
          Aaron Kimball made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12408205/mysqldump.patch
          against trunk revision 776148.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/351/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12408205/mysqldump.patch against trunk revision 776148. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/351/console This message is automatically generated.
          Hide
          Aaron Kimball added a comment -

          Attached initial implementation.

          Show
          Aaron Kimball added a comment - Attached initial implementation.
          Aaron Kimball made changes -
          Link This issue depends on HADOOP-5815 [ HADOOP-5815 ]
          Aaron Kimball made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Aaron Kimball made changes -
          Field Original Value New Value
          Attachment mysqldump.patch [ 12408205 ]
          Aaron Kimball created issue -

            People

            • Assignee:
              Aaron Kimball
              Reporter:
              Aaron Kimball
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development