Sqoop
  1. Sqoop
  2. SQOOP-443

Calling sqoop with hive import is not working multiple times due to kept output directory

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.4.0-incubating, 1.4.1-incubating
    • Fix Version/s: 1.4.2
    • Component/s: None
    • Labels:
      None

      Description

      Hive is not removing input directory when doing "LOAD DATA" command in all cases. This input directory is actually sqoop's export directory. Because this directory is kept, calling same sqoop command twice is failing on exception "org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory $table already exists".

      This issue might be easily overcome by manual directory removal, however it's putting unnecessary burden on users. It's also complicating executing saved jobs as there is additional script execution needed.

      1. SQOOP-443.patch
        3 kB
        Jarek Jarcec Cecho
      2. SQOOP-443.patch
        3 kB
        Jarek Jarcec Cecho

        Activity

        Hide
        Hudson added a comment -

        Integrated in Sqoop-ant-jdk-1.6 #111 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6/111/)
        SQOOP-443. Calling sqoop with hive import is not working multiple times due
        to kept output directory
        (Jarek Jarcec Cecho via Kathleen Ting) (Revision 1334328)

        Result = SUCCESS
        kathleen :
        Files :

        • /sqoop/trunk/src/java/org/apache/sqoop/hive/HiveImport.java
        Show
        Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6 #111 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6/111/ ) SQOOP-443 . Calling sqoop with hive import is not working multiple times due to kept output directory (Jarek Jarcec Cecho via Kathleen Ting) (Revision 1334328) Result = SUCCESS kathleen : Files : /sqoop/trunk/src/java/org/apache/sqoop/hive/HiveImport.java
        Hide
        Kathleen Ting added a comment -

        Patch committed. Thanks Jarcec!

        Show
        Kathleen Ting added a comment - Patch committed. Thanks Jarcec!
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-05-04 16:58:55, Kathleen Ting wrote:

        > +1

        > Jarcec, Cheolsoo - sorry for not getting to this sooner.

        Kathleen Ting wrote:

        The timestamp of the patch uploaded to the jira is 2/18 but on review board I see a comment made at 4/19 stating "Patch rebase to current moved repository trunk." Jarcec, is the latest patch attached to the jira?

        Good point Kate,
        thanks for checking, I've uploaded rebased version to JIRA.

        Jarcec

        • Jarek

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4798/#review7570
        -----------------------------------------------------------

        On 2012-04-19 05:56:56, Jarek Cecho wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4798/

        -----------------------------------------------------------

        (Updated 2012-04-19 05:56:56)

        Review request for Sqoop, Arvind Prabhakar and Cheolsoo Park.

        Summary

        -------

        I've added code that is removing export directory in case that it's empty.

        (Recreating review on moved SVN repository)

        This addresses bug SQOOP-443.

        https://issues.apache.org/jira/browse/SQOOP-443

        Diffs

        -----

        /src/java/org/apache/sqoop/hive/HiveImport.java 1327832

        Diff: https://reviews.apache.org/r/4798/diff

        Testing

        -------

        ant -Dhadoopversion={20, 23, 100} test

        real testing environment based on CDH3

        Thanks,

        Jarek

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-05-04 16:58:55, Kathleen Ting wrote: > +1 > Jarcec, Cheolsoo - sorry for not getting to this sooner. Kathleen Ting wrote: The timestamp of the patch uploaded to the jira is 2/18 but on review board I see a comment made at 4/19 stating "Patch rebase to current moved repository trunk." Jarcec, is the latest patch attached to the jira? Good point Kate, thanks for checking, I've uploaded rebased version to JIRA. Jarcec Jarek ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4798/#review7570 ----------------------------------------------------------- On 2012-04-19 05:56:56, Jarek Cecho wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4798/ ----------------------------------------------------------- (Updated 2012-04-19 05:56:56) Review request for Sqoop, Arvind Prabhakar and Cheolsoo Park. Summary ------- I've added code that is removing export directory in case that it's empty. (Recreating review on moved SVN repository) This addresses bug SQOOP-443 . https://issues.apache.org/jira/browse/SQOOP-443 Diffs ----- /src/java/org/apache/sqoop/hive/HiveImport.java 1327832 Diff: https://reviews.apache.org/r/4798/diff Testing ------- ant -Dhadoopversion={20, 23, 100} test real testing environment based on CDH3 Thanks, Jarek
        Hide
        Jarek Jarcec Cecho added a comment -

        Attaching rebased patch for current trunk.

        Show
        Jarek Jarcec Cecho added a comment - Attaching rebased patch for current trunk.
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-05-04 16:58:55, Kathleen Ting wrote:

        > +1

        > Jarcec, Cheolsoo - sorry for not getting to this sooner.

        The timestamp of the patch uploaded to the jira is 2/18 but on review board I see a comment made at 4/19 stating "Patch rebase to current moved repository trunk." Jarcec, is the latest patch attached to the jira?

        • Kathleen

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4798/#review7570
        -----------------------------------------------------------

        On 2012-04-19 05:56:56, Jarek Cecho wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4798/

        -----------------------------------------------------------

        (Updated 2012-04-19 05:56:56)

        Review request for Sqoop, Arvind Prabhakar and Cheolsoo Park.

        Summary

        -------

        I've added code that is removing export directory in case that it's empty.

        (Recreating review on moved SVN repository)

        This addresses bug SQOOP-443.

        https://issues.apache.org/jira/browse/SQOOP-443

        Diffs

        -----

        /src/java/org/apache/sqoop/hive/HiveImport.java 1327832

        Diff: https://reviews.apache.org/r/4798/diff

        Testing

        -------

        ant -Dhadoopversion={20, 23, 100} test

        real testing environment based on CDH3

        Thanks,

        Jarek

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-05-04 16:58:55, Kathleen Ting wrote: > +1 > Jarcec, Cheolsoo - sorry for not getting to this sooner. The timestamp of the patch uploaded to the jira is 2/18 but on review board I see a comment made at 4/19 stating "Patch rebase to current moved repository trunk." Jarcec, is the latest patch attached to the jira? Kathleen ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4798/#review7570 ----------------------------------------------------------- On 2012-04-19 05:56:56, Jarek Cecho wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4798/ ----------------------------------------------------------- (Updated 2012-04-19 05:56:56) Review request for Sqoop, Arvind Prabhakar and Cheolsoo Park. Summary ------- I've added code that is removing export directory in case that it's empty. (Recreating review on moved SVN repository) This addresses bug SQOOP-443 . https://issues.apache.org/jira/browse/SQOOP-443 Diffs ----- /src/java/org/apache/sqoop/hive/HiveImport.java 1327832 Diff: https://reviews.apache.org/r/4798/diff Testing ------- ant -Dhadoopversion={20, 23, 100} test real testing environment based on CDH3 Thanks, Jarek
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4798/#review7570
        -----------------------------------------------------------

        Ship it!

        +1
        Jarcec, Cheolsoo - sorry for not getting to this sooner.

        • Kathleen

        On 2012-04-19 05:56:56, Jarek Cecho wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4798/

        -----------------------------------------------------------

        (Updated 2012-04-19 05:56:56)

        Review request for Sqoop, Arvind Prabhakar and Cheolsoo Park.

        Summary

        -------

        I've added code that is removing export directory in case that it's empty.

        (Recreating review on moved SVN repository)

        This addresses bug SQOOP-443.

        https://issues.apache.org/jira/browse/SQOOP-443

        Diffs

        -----

        /src/java/org/apache/sqoop/hive/HiveImport.java 1327832

        Diff: https://reviews.apache.org/r/4798/diff

        Testing

        -------

        ant -Dhadoopversion={20, 23, 100} test

        real testing environment based on CDH3

        Thanks,

        Jarek

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4798/#review7570 ----------------------------------------------------------- Ship it! +1 Jarcec, Cheolsoo - sorry for not getting to this sooner. Kathleen On 2012-04-19 05:56:56, Jarek Cecho wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4798/ ----------------------------------------------------------- (Updated 2012-04-19 05:56:56) Review request for Sqoop, Arvind Prabhakar and Cheolsoo Park. Summary ------- I've added code that is removing export directory in case that it's empty. (Recreating review on moved SVN repository) This addresses bug SQOOP-443 . https://issues.apache.org/jira/browse/SQOOP-443 Diffs ----- /src/java/org/apache/sqoop/hive/HiveImport.java 1327832 Diff: https://reviews.apache.org/r/4798/diff Testing ------- ant -Dhadoopversion={20, 23, 100} test real testing environment based on CDH3 Thanks, Jarek
        Hide
        Jarek Jarcec Cecho added a comment -

        Hi Cheolsoo,
        I agree with you, let's create separate JIRA ticket for adding general way of removing output directory (using some command line argument).

        Jarcec

        Show
        Jarek Jarcec Cecho added a comment - Hi Cheolsoo, I agree with you, let's create separate JIRA ticket for adding general way of removing output directory (using some command line argument). Jarcec
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-05-04 06:44:10, Cheolsoo Park wrote:

        > This patch has been posted for a while. It would be nice if someone could commit this patch.

        >

        > The jira SQOOP-483 will be likely to touch the same area of code, so it will be nice if we can avoid any merge conflicts.

        Hi Cheolsoo,
        thank you very much for your review! However I believe that we have the "two committer" policy in sqoop, so that I'm not allowed to commit my own patch

        Jarcec

        • Jarek

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4798/#review7548
        -----------------------------------------------------------

        On 2012-04-19 05:56:56, Jarek Cecho wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4798/

        -----------------------------------------------------------

        (Updated 2012-04-19 05:56:56)

        Review request for Sqoop, Arvind Prabhakar and Cheolsoo Park.

        Summary

        -------

        I've added code that is removing export directory in case that it's empty.

        (Recreating review on moved SVN repository)

        This addresses bug SQOOP-443.

        https://issues.apache.org/jira/browse/SQOOP-443

        Diffs

        -----

        /src/java/org/apache/sqoop/hive/HiveImport.java 1327832

        Diff: https://reviews.apache.org/r/4798/diff

        Testing

        -------

        ant -Dhadoopversion={20, 23, 100} test

        real testing environment based on CDH3

        Thanks,

        Jarek

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-05-04 06:44:10, Cheolsoo Park wrote: > This patch has been posted for a while. It would be nice if someone could commit this patch. > > The jira SQOOP-483 will be likely to touch the same area of code, so it will be nice if we can avoid any merge conflicts. Hi Cheolsoo, thank you very much for your review! However I believe that we have the "two committer" policy in sqoop, so that I'm not allowed to commit my own patch Jarcec Jarek ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4798/#review7548 ----------------------------------------------------------- On 2012-04-19 05:56:56, Jarek Cecho wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4798/ ----------------------------------------------------------- (Updated 2012-04-19 05:56:56) Review request for Sqoop, Arvind Prabhakar and Cheolsoo Park. Summary ------- I've added code that is removing export directory in case that it's empty. (Recreating review on moved SVN repository) This addresses bug SQOOP-443 . https://issues.apache.org/jira/browse/SQOOP-443 Diffs ----- /src/java/org/apache/sqoop/hive/HiveImport.java 1327832 Diff: https://reviews.apache.org/r/4798/diff Testing ------- ant -Dhadoopversion={20, 23, 100} test real testing environment based on CDH3 Thanks, Jarek
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4798/#review7548
        -----------------------------------------------------------

        Ship it!

        This patch has been posted for a while. It would be nice if someone could commit this patch.

        The jira SQOOP-483 will be likely to touch the same area of code, so it will be nice if we can avoid any merge conflicts.

        • Cheolsoo

        On 2012-04-19 05:56:56, Jarek Cecho wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4798/

        -----------------------------------------------------------

        (Updated 2012-04-19 05:56:56)

        Review request for Sqoop, Arvind Prabhakar and Cheolsoo Park.

        Summary

        -------

        I've added code that is removing export directory in case that it's empty.

        (Recreating review on moved SVN repository)

        This addresses bug SQOOP-443.

        https://issues.apache.org/jira/browse/SQOOP-443

        Diffs

        -----

        /src/java/org/apache/sqoop/hive/HiveImport.java 1327832

        Diff: https://reviews.apache.org/r/4798/diff

        Testing

        -------

        ant -Dhadoopversion={20, 23, 100} test

        real testing environment based on CDH3

        Thanks,

        Jarek

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4798/#review7548 ----------------------------------------------------------- Ship it! This patch has been posted for a while. It would be nice if someone could commit this patch. The jira SQOOP-483 will be likely to touch the same area of code, so it will be nice if we can avoid any merge conflicts. Cheolsoo On 2012-04-19 05:56:56, Jarek Cecho wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4798/ ----------------------------------------------------------- (Updated 2012-04-19 05:56:56) Review request for Sqoop, Arvind Prabhakar and Cheolsoo Park. Summary ------- I've added code that is removing export directory in case that it's empty. (Recreating review on moved SVN repository) This addresses bug SQOOP-443 . https://issues.apache.org/jira/browse/SQOOP-443 Diffs ----- /src/java/org/apache/sqoop/hive/HiveImport.java 1327832 Diff: https://reviews.apache.org/r/4798/diff Testing ------- ant -Dhadoopversion={20, 23, 100} test real testing environment based on CDH3 Thanks, Jarek
        Hide
        Cheolsoo Park added a comment -

        Hi Nemon,

        I like your idea. In fact, I often find myself removing output dir, so having such an option will be very useful.

        That being said, I think that we should treat removing output dir in hive import differently from removing output dir in general sqoop import. In hive import, output dir is no more than temporary staging dir whereas it is a permanent destination in general sqoop import. It makes sense to me to clean up temporary staging dir when the job was successful because data is safely moved to hive.

        So I think that it is better to handle your suggestion in a separate jira.

        Thoughts?

        Show
        Cheolsoo Park added a comment - Hi Nemon, I like your idea. In fact, I often find myself removing output dir, so having such an option will be very useful. That being said, I think that we should treat removing output dir in hive import differently from removing output dir in general sqoop import. In hive import, output dir is no more than temporary staging dir whereas it is a permanent destination in general sqoop import. It makes sense to me to clean up temporary staging dir when the job was successful because data is safely moved to hive. So I think that it is better to handle your suggestion in a separate jira. Thoughts?
        Hide
        Nemon Lou added a comment -

        Is there a security problem to removing export directory by Sqoop?
        If not ,how about offering an arguments "--overwrite true" or something like this to support auto removing export directory?
        So we don't need to remove it ourselves

        Show
        Nemon Lou added a comment - Is there a security problem to removing export directory by Sqoop? If not ,how about offering an arguments "--overwrite true" or something like this to support auto removing export directory? So we don't need to remove it ourselves
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4798/
        -----------------------------------------------------------

        (Updated 2012-04-19 05:56:56.889555)

        Review request for Sqoop, Arvind Prabhakar and Cheolsoo Park.

        Changes
        -------

        Patch rebase to current moved repository trunk.

        Summary
        -------

        I've added code that is removing export directory in case that it's empty.

        (Recreating review on moved SVN repository)

        This addresses bug SQOOP-443.
        https://issues.apache.org/jira/browse/SQOOP-443

        Diffs (updated)


        /src/java/org/apache/sqoop/hive/HiveImport.java 1327832

        Diff: https://reviews.apache.org/r/4798/diff

        Testing
        -------

        ant -Dhadoopversion=

        {20, 23, 100}

        test
        real testing environment based on CDH3

        Thanks,

        Jarek

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4798/ ----------------------------------------------------------- (Updated 2012-04-19 05:56:56.889555) Review request for Sqoop, Arvind Prabhakar and Cheolsoo Park. Changes ------- Patch rebase to current moved repository trunk. Summary ------- I've added code that is removing export directory in case that it's empty. (Recreating review on moved SVN repository) This addresses bug SQOOP-443 . https://issues.apache.org/jira/browse/SQOOP-443 Diffs (updated) /src/java/org/apache/sqoop/hive/HiveImport.java 1327832 Diff: https://reviews.apache.org/r/4798/diff Testing ------- ant -Dhadoopversion= {20, 23, 100} test real testing environment based on CDH3 Thanks, Jarek
        Hide
        Jarek Jarcec Cecho added a comment -

        Thanks Cheolsoo for your time. Your vote definitely counts However I believe that our current commit policy requires +1 from another committer in order to get the patch committed.

        Jarcec

        Show
        Jarek Jarcec Cecho added a comment - Thanks Cheolsoo for your time. Your vote definitely counts However I believe that our current commit policy requires +1 from another committer in order to get the patch committed. Jarcec
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4798/
        -----------------------------------------------------------

        Review request for Sqoop, Arvind Prabhakar and Cheolsoo Park.

        Summary
        -------

        I've added code that is removing export directory in case that it's empty.

        (Recreating review on moved SVN repository)

        This addresses bug SQOOP-443.
        https://issues.apache.org/jira/browse/SQOOP-443

        Diffs


        Diff: https://reviews.apache.org/r/4798/diff

        Testing
        -------

        ant -Dhadoopversion=

        {20, 23, 100}

        test
        real testing environment based on CDH3

        Thanks,

        Jarek

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4798/ ----------------------------------------------------------- Review request for Sqoop, Arvind Prabhakar and Cheolsoo Park. Summary ------- I've added code that is removing export directory in case that it's empty. (Recreating review on moved SVN repository) This addresses bug SQOOP-443 . https://issues.apache.org/jira/browse/SQOOP-443 Diffs Diff: https://reviews.apache.org/r/4798/diff Testing ------- ant -Dhadoopversion= {20, 23, 100} test real testing environment based on CDH3 Thanks, Jarek
        Hide
        Cheolsoo Park added a comment -

        I can't open the patch on review board probably due to the recent svn repository migration. Nevertheless, I downloaded and reviewed it in my workspace.

        +1 (if my vote counts)

        Show
        Cheolsoo Park added a comment - I can't open the patch on review board probably due to the recent svn repository migration. Nevertheless, I downloaded and reviewed it in my workspace. +1 (if my vote counts)
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3952/
        -----------------------------------------------------------

        Review request for Sqoop, Arvind Prabhakar and Bilung Lee.

        Summary
        -------

        I've added code that is removing export directory in case that it's empty.

        This addresses bug SQOOP-443.
        https://issues.apache.org/jira/browse/SQOOP-443

        Diffs


        /src/java/org/apache/sqoop/hive/HiveImport.java 1245157

        Diff: https://reviews.apache.org/r/3952/diff

        Testing
        -------

        ant -Dhadoopversion=

        {20, 23, 100}

        test
        real testing environment based on CDH3

        Thanks,

        Jarek

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3952/ ----------------------------------------------------------- Review request for Sqoop, Arvind Prabhakar and Bilung Lee. Summary ------- I've added code that is removing export directory in case that it's empty. This addresses bug SQOOP-443 . https://issues.apache.org/jira/browse/SQOOP-443 Diffs /src/java/org/apache/sqoop/hive/HiveImport.java 1245157 Diff: https://reviews.apache.org/r/3952/diff Testing ------- ant -Dhadoopversion= {20, 23, 100} test real testing environment based on CDH3 Thanks, Jarek

          People

          • Assignee:
            Jarek Jarcec Cecho
            Reporter:
            Jarek Jarcec Cecho
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development