Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Add export capability to Sqoop

      Description

      Sqoop can import from a database into HDFS. It's high time it works in reverse too.

      1. MAPREDUCE-1168.patch
        78 kB
        Aaron Kimball
      2. MAPREDUCE-1168.2.patch
        216 kB
        Aaron Kimball

        Activity

        Chris Douglas made changes -
        Component/s contrib/sqoop [ 12312930 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #146 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/146/)
        . Export data to databases via Sqoop. Contributed by Aaron Kimball.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #146 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/146/ ) . Export data to databases via Sqoop. Contributed by Aaron Kimball.
        Tom White made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Hide
        Tom White added a comment -

        I've just committed this. Thanks Aaron!

        Show
        Tom White added a comment - I've just committed this. Thanks Aaron!
        Hide
        Aaron Kimball added a comment -

        Unrelated contrib failure in fair scheduler.

        Show
        Aaron Kimball added a comment - Unrelated contrib failure in fair scheduler.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12426306/MAPREDUCE-1168.2.patch
        against trunk revision 884832.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 52 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/276/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/276/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/276/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/276/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426306/MAPREDUCE-1168.2.patch against trunk revision 884832. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 52 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/276/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/276/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/276/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/276/console This message is automatically generated.
        Aaron Kimball made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Aaron Kimball made changes -
        Attachment MAPREDUCE-1168.2.patch [ 12426306 ]
        Hide
        Aaron Kimball added a comment -

        Thanks for reviewing, Tom.

        I agree on the renames of ImportOptions -> SqoopOptions and ExportError -> ExportException. I had forgotten that exceptions with names ending in "Error" have an implied semantic meaning. For consistency, I've also renamed ImportError to ImportException.

        You'll need to do the following in the subversion tree after applying this patch:

        svn mv --force src/contrib/sqoop/src/java/org/apache/hadoop/sqoop/ImportOptions.java src/contrib/sqoop/src/java/org/apache/hadoop/sqoop/SqoopOptions.java
        
        svn mv --force src/contrib/sqoop/src/java/org/apache/hadoop/sqoop/util/ImportError.java src/contrib/sqoop/src/java/org/apache/hadoop/sqoop/util/ImportException.java
        
        svn mv --force src/contrib/sqoop/src/test/org/apache/hadoop/sqoop/TestImportOptions.java src/contrib/sqoop/src/test/org/apache/hadoop/sqoop/TestSqoopOptions.java
        

        AutoInputFormat is currently based on the old API; until this is ported, it's incompatible with the new API (which Sqoop tries to use as much as possible). Definitely worth keeping in mind for the future though, once streaming is moved to the new API. I've copied the magic-number test from AutoInputFormat into isSequenceFiles(). Note that this means that any text file beginning with the characters "SEQ" is going to be misinterpreted as a SequenceFile now. This case is hopefully rare.. we can address it when it comes up in practice?

        This patch is also resync'd to trunk.

        Show
        Aaron Kimball added a comment - Thanks for reviewing, Tom. I agree on the renames of ImportOptions -> SqoopOptions and ExportError -> ExportException. I had forgotten that exceptions with names ending in "Error" have an implied semantic meaning. For consistency, I've also renamed ImportError to ImportException. You'll need to do the following in the subversion tree after applying this patch: svn mv --force src/contrib/sqoop/src/java/org/apache/hadoop/sqoop/ImportOptions.java src/contrib/sqoop/src/java/org/apache/hadoop/sqoop/SqoopOptions.java svn mv --force src/contrib/sqoop/src/java/org/apache/hadoop/sqoop/util/ImportError.java src/contrib/sqoop/src/java/org/apache/hadoop/sqoop/util/ImportException.java svn mv --force src/contrib/sqoop/src/test/org/apache/hadoop/sqoop/TestImportOptions.java src/contrib/sqoop/src/test/org/apache/hadoop/sqoop/TestSqoopOptions.java AutoInputFormat is currently based on the old API; until this is ported, it's incompatible with the new API (which Sqoop tries to use as much as possible). Definitely worth keeping in mind for the future though, once streaming is moved to the new API. I've copied the magic-number test from AutoInputFormat into isSequenceFiles() . Note that this means that any text file beginning with the characters "SEQ" is going to be misinterpreted as a SequenceFile now. This case is hopefully rare.. we can address it when it comes up in practice? This patch is also resync'd to trunk.
        Aaron Kimball made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Tom White added a comment -

        +1 This looks good. A few nits:

        • ExportError. Should this be ExportException? Errors are generally only thrown by the JVM itself.
        • ImportOptions. This is now being used for export options too, so should probably be renamed SqoopOptions.
        • ExportJob#isSequenceFiles() could use magic numbers to detect the format. Could AutoInputFormat be used, or adapted?
        Show
        Tom White added a comment - +1 This looks good. A few nits: ExportError. Should this be ExportException? Errors are generally only thrown by the JVM itself. ImportOptions. This is now being used for export options too, so should probably be renamed SqoopOptions. ExportJob#isSequenceFiles() could use magic numbers to detect the format. Could AutoInputFormat be used, or adapted?
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12423616/MAPREDUCE-1168.patch
        against trunk revision 831037.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 12 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/109/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/109/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/109/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/109/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423616/MAPREDUCE-1168.patch against trunk revision 831037. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/109/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/109/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/109/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/109/console This message is automatically generated.
        Aaron Kimball made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Aaron Kimball made changes -
        Field Original Value New Value
        Attachment MAPREDUCE-1168.patch [ 12423616 ]
        Hide
        Aaron Kimball added a comment -

        This patch provides Sqoop with the ability to export tables from HDFS to an external RDBMS. Sqoop runs a MapReduce job over the contents of a directory (identified by --export-dir), parsing the records contained within based on the auto-generated class definition for a table. DBOutputFormat is used to inject the records back into the database table (specified by --table). The table must already exist in the target database.

        Sqoop can auto-generate the appropriate ORM class for parsing the input files by examining the target table (much as is done during importing); the existing command-line options that govern delimiters are used to specify which delimiters are used in the files to be exported.

        If an ORM class has already been generated for the table, this can now be specified with the --jar-file and --class-name options; code auto-generation is bypassed in this case. (This applies to imports as well.)

        Export supports both delimited text files as well as SequenceFiles containing SqoopRecords as values (i.e., SequenceFiles created via a Sqoop import with --as-sequencefile). Users do not need to identify the file type; it is automatically inferred. Gzipped text files will be handled transparantly.

        Testing has been performed via unit tests (included) against HSQLDB with several column datatypes. I performed manual larger-scale testing by exporting 100MB and 500MB datasets containing 1- and 5 million rows respectively to tables in mysql.

        Show
        Aaron Kimball added a comment - This patch provides Sqoop with the ability to export tables from HDFS to an external RDBMS. Sqoop runs a MapReduce job over the contents of a directory (identified by --export-dir ), parsing the records contained within based on the auto-generated class definition for a table. DBOutputFormat is used to inject the records back into the database table (specified by --table ). The table must already exist in the target database. Sqoop can auto-generate the appropriate ORM class for parsing the input files by examining the target table (much as is done during importing); the existing command-line options that govern delimiters are used to specify which delimiters are used in the files to be exported. If an ORM class has already been generated for the table, this can now be specified with the --jar-file and --class-name options; code auto-generation is bypassed in this case. (This applies to imports as well.) Export supports both delimited text files as well as SequenceFiles containing SqoopRecords as values (i.e., SequenceFiles created via a Sqoop import with --as-sequencefile ). Users do not need to identify the file type; it is automatically inferred. Gzipped text files will be handled transparantly. Testing has been performed via unit tests (included) against HSQLDB with several column datatypes. I performed manual larger-scale testing by exporting 100MB and 500MB datasets containing 1- and 5 million rows respectively to tables in mysql.
        Aaron Kimball created issue -

          People

          • Assignee:
            Aaron Kimball
            Reporter:
            Aaron Kimball
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development