Sqoop
  1. Sqoop
  2. SQOOP-1078

incremental import from database in direct mode

    Details

      Description

      A problem exists in Sqoop's incremental import, namely that any imports
      after the first report success but the data never appears.
      A temporary file created on HDFS with the data but is deleted upon
      completion rather than being moved into place.

      It turns out to be a conflict between the "direct mode" database
      managers and "incremental mode" import. Ordinarily Sqoop ends up
      creating files named part-m-nnnnn where nnnnn is an incrementing file
      partition number. However the direct mode importer creates files of
      the form data-nnnnn. This poses a problem because AppendUtils, which
      is used to move files into place at the end of a direct import, only
      copies files which match that part-m-nnnnn format and discards the
      rest.

        Activity

        Hide
        Hudson added a comment -

        Integrated in Sqoop-ant-jdk-1.6-hadoop23 #941 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop23/941/)
        SQOOP-1078: incremental import from database in direct mode (Revision 92e94b911d203fafbd4f3784badd29431aa5bf78)

        Result = SUCCESS
        jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=92e94b911d203fafbd4f3784badd29431aa5bf78
        Files :

        • src/java/org/apache/sqoop/util/AppendUtils.java
        • src/java/org/apache/sqoop/util/DirectImportUtils.java
        Show
        Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6-hadoop23 #941 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop23/941/ ) SQOOP-1078 : incremental import from database in direct mode (Revision 92e94b911d203fafbd4f3784badd29431aa5bf78) Result = SUCCESS jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=92e94b911d203fafbd4f3784badd29431aa5bf78 Files : src/java/org/apache/sqoop/util/AppendUtils.java src/java/org/apache/sqoop/util/DirectImportUtils.java
        Hide
        Hudson added a comment -

        Integrated in Sqoop-ant-jdk-1.6-hadoop100 #717 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop100/717/)
        SQOOP-1078: incremental import from database in direct mode (Revision 92e94b911d203fafbd4f3784badd29431aa5bf78)

        Result = SUCCESS
        jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=92e94b911d203fafbd4f3784badd29431aa5bf78
        Files :

        • src/java/org/apache/sqoop/util/AppendUtils.java
        • src/java/org/apache/sqoop/util/DirectImportUtils.java
        Show
        Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6-hadoop100 #717 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop100/717/ ) SQOOP-1078 : incremental import from database in direct mode (Revision 92e94b911d203fafbd4f3784badd29431aa5bf78) Result = SUCCESS jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=92e94b911d203fafbd4f3784badd29431aa5bf78 Files : src/java/org/apache/sqoop/util/AppendUtils.java src/java/org/apache/sqoop/util/DirectImportUtils.java
        Hide
        Hudson added a comment -

        Integrated in Sqoop-ant-jdk-1.6-hadoop200 #753 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop200/753/)
        SQOOP-1078: incremental import from database in direct mode (Revision 92e94b911d203fafbd4f3784badd29431aa5bf78)

        Result = SUCCESS
        jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=92e94b911d203fafbd4f3784badd29431aa5bf78
        Files :

        • src/java/org/apache/sqoop/util/DirectImportUtils.java
        • src/java/org/apache/sqoop/util/AppendUtils.java
        Show
        Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6-hadoop200 #753 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop200/753/ ) SQOOP-1078 : incremental import from database in direct mode (Revision 92e94b911d203fafbd4f3784badd29431aa5bf78) Result = SUCCESS jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=92e94b911d203fafbd4f3784badd29431aa5bf78 Files : src/java/org/apache/sqoop/util/DirectImportUtils.java src/java/org/apache/sqoop/util/AppendUtils.java
        Hide
        Hudson added a comment -

        Integrated in Sqoop-ant-jdk-1.6-hadoop20 #740 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop20/740/)
        SQOOP-1078: incremental import from database in direct mode (Revision 92e94b911d203fafbd4f3784badd29431aa5bf78)

        Result = SUCCESS
        jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=92e94b911d203fafbd4f3784badd29431aa5bf78
        Files :

        • src/java/org/apache/sqoop/util/DirectImportUtils.java
        • src/java/org/apache/sqoop/util/AppendUtils.java
        Show
        Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6-hadoop20 #740 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop20/740/ ) SQOOP-1078 : incremental import from database in direct mode (Revision 92e94b911d203fafbd4f3784badd29431aa5bf78) Result = SUCCESS jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=92e94b911d203fafbd4f3784badd29431aa5bf78 Files : src/java/org/apache/sqoop/util/DirectImportUtils.java src/java/org/apache/sqoop/util/AppendUtils.java
        Jarek Jarcec Cecho made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 1.4.4 [ 12324082 ]
        Fix Version/s 1.3.0 [ 12317344 ]
        Resolution Fixed [ 1 ]
        Hide
        Jarek Jarcec Cecho added a comment -

        Thank you Tim for your contribution, greatly appreciated!

        Show
        Jarek Jarcec Cecho added a comment - Thank you Tim for your contribution, greatly appreciated!
        Hide
        ASF subversion and git services added a comment -

        Commit 92e94b911d203fafbd4f3784badd29431aa5bf78 in branch refs/heads/trunk from Jarek Jarcec Cecho
        [ https://git-wip-us.apache.org/repos/asf?p=sqoop.git;h=92e94b9 ]

        SQOOP-1078: incremental import from database in direct mode

        (Tim Howe via Jarek Jarcec Cecho)

        Show
        ASF subversion and git services added a comment - Commit 92e94b911d203fafbd4f3784badd29431aa5bf78 in branch refs/heads/trunk from Jarek Jarcec Cecho [ https://git-wip-us.apache.org/repos/asf?p=sqoop.git;h=92e94b9 ] SQOOP-1078 : incremental import from database in direct mode (Tim Howe via Jarek Jarcec Cecho)
        Jarek Jarcec Cecho made changes -
        Assignee Tim Howe [ thowe_ta ]
        Hide
        Jarek Jarcec Cecho added a comment -

        Thank you Tim Howe! The patch is great, would you mind uploading it to review board and rebasing on current trunk? I'll be more than happy to review and commit it.

        Show
        Jarek Jarcec Cecho added a comment - Thank you Tim Howe ! The patch is great, would you mind uploading it to review board and rebasing on current trunk? I'll be more than happy to review and commit it.
        Tim Howe made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Labels patch
        Fix Version/s 1.3.0 [ 12317344 ]
        Tim Howe made changes -
        Attachment sqoop-incremental-direct.patch [ 12587676 ]
        Hide
        Tim Howe added a comment -

        I've written a patch which causes direct imports to use the same naming convention elsewhere. Attached please also find some changes to AppendUtils which improve resiliency especially if there happen to be multiple concurrent operations on the same table. This patch is against sqoop-1.3.0-cdh3u3 but seems to apply and build with minimal changes across the whole 1.x series.

        Note: I don't know where the "part-m-nnnnn" naming comes from and if the "-m" signifies anything. I did hunt around in order to find the code which creates those files but with no luck.

        Show
        Tim Howe added a comment - I've written a patch which causes direct imports to use the same naming convention elsewhere. Attached please also find some changes to AppendUtils which improve resiliency especially if there happen to be multiple concurrent operations on the same table. This patch is against sqoop-1.3.0-cdh3u3 but seems to apply and build with minimal changes across the whole 1.x series. Note: I don't know where the "part-m-nnnnn" naming comes from and if the "-m" signifies anything. I did hunt around in order to find the code which creates those files but with no luck.
        Tim Howe made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Tim Howe made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Tim Howe made changes -
        Field Original Value New Value
        Summary Sqoop incremental import from database in direct mode
        Tim Howe created issue -

          People

          • Assignee:
            Tim Howe
            Reporter:
            Tim Howe
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development