Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-674

Sqoop should allow a "where" clause to avoid having to export entire tables

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Sqoop currently only exports at the granularity of a table. This doesn't work well on systems with large tables, where the overhead of performing a full dump each time is significant. Allowing the user to specify a where clause is a relatively simple task which will give Sqoop a lot more flexibility.

      1. HADOOP-6115.patch
        13 kB
        Kevin Weil
      2. MAPREDUCE-674.patch
        13 kB
        Kevin Weil

        Activity

        Hide
        Kevin Weil added a comment -

        I have this working except for in the case of optimized local importing. Not sure why this is failing ATM because the same command line statement succeeds, but I expect to have a patch by the end of the weekend.

        Show
        Kevin Weil added a comment - I have this working except for in the case of optimized local importing. Not sure why this is failing ATM because the same command line statement succeeds, but I expect to have a patch by the end of the weekend.
        Hide
        Kevin Weil added a comment -

        This patch adds the ability to specify a "where" clause to sqoop, in the form

        <normal sqoop command> --where "my_row > 0 and my_text like '%sqoop%'"

        It also works with sqoop's mysql-specific --local flag, using mysqldump's -w flag.

        Show
        Kevin Weil added a comment - This patch adds the ability to specify a "where" clause to sqoop, in the form <normal sqoop command> --where "my_row > 0 and my_text like '%sqoop%'" It also works with sqoop's mysql-specific --local flag, using mysqldump's -w flag.
        Hide
        Aaron Kimball added a comment -

        +1 – patch looks good to me. Thanks for cleaning up my dangling imports.

        I wouldn't mind if you felt like being ambitious and cleaned up that TODO in TestWhere.java by either pushing code down into ImportJobTestCase or subclassing it for more specific behavior. But I think this is good as-is, too.

        Show
        Aaron Kimball added a comment - +1 – patch looks good to me. Thanks for cleaning up my dangling imports. I wouldn't mind if you felt like being ambitious and cleaned up that TODO in TestWhere.java by either pushing code down into ImportJobTestCase or subclassing it for more specific behavior. But I think this is good as-is, too.
        Hide
        Kevin Weil added a comment -

        Updated the patch to be against the mapreduce svn tree now that it's split. No code changes since the original patch otherwise.

        Show
        Kevin Weil added a comment - Updated the patch to be against the mapreduce svn tree now that it's split. No code changes since the original patch otherwise.
        Hide
        Aaron Kimball added a comment -

        Cycling patch status to get Hudson to enqueue this for testing...

        Show
        Aaron Kimball added a comment - Cycling patch status to get Hudson to enqueue this for testing...
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12412131/MAPREDUCE-674.patch
        against trunk revision 790971.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/352/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/352/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/352/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/352/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12412131/MAPREDUCE-674.patch against trunk revision 790971. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/352/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/352/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/352/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/352/console This message is automatically generated.
        Hide
        Aaron Kimball added a comment -

        Test failures are all unrelated.

        Show
        Aaron Kimball added a comment - Test failures are all unrelated.
        Hide
        Tom White added a comment -

        I've just committed this. Thanks Kevin!

        Show
        Tom White added a comment - I've just committed this. Thanks Kevin!
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #15 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/15/)
        . Sqoop should allow a "where" clause to avoid having to export entire tables. Contributed by Kevin Weil.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #15 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/15/ ) . Sqoop should allow a "where" clause to avoid having to export entire tables. Contributed by Kevin Weil.

          People

          • Assignee:
            Kevin Weil
            Reporter:
            Kevin Weil
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development