Sqoop
  1. Sqoop
  2. SQOOP-474

Split-by specification incorrectly triggers bounding value query

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.1-incubating
    • Fix Version/s: 1.4.2
    • Component/s: build, connectors/generic
    • Labels:
      None

      Description

      To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:

      $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
      

      This import will output the following:

      12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE  (1 = 1) ) AS t1
      

      An embedded query fails in DB2 when using the 'with ur' syntax. This also fails for Informix if the version of Informix doesn't support embedded queries. The issue is the 'with ur' syntax, without which, the boundary query is harmless. The boundary query is being triggered because of the split-by specification. However specifying split-by is redundant given that the number of mappers is 1.

      1. SQOOP-474.patch
        0.7 kB
        Kathleen Ting
      2. SQOOP-474-1.patch
        0.5 kB
        Kathleen Ting

        Activity

        Kathleen Ting created issue -
        Kathleen Ting made changes -
        Field Original Value New Value
        Attachment SQOOP-474.patch [ 12521051 ]
        Kathleen Ting made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4614/
        -----------------------------------------------------------

        Review request for Sqoop and Arvind Prabhakar.

        Summary
        -------

        Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1.

        This addresses bug SQOOP-474.
        https://issues.apache.org/jira/browse/SQOOP-474

        Diffs


        ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1308530

        Diff: https://reviews.apache.org/r/4614/diff

        Testing
        -------

        Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery).

        Thanks,

        Kathleen

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4614/ ----------------------------------------------------------- Review request for Sqoop and Arvind Prabhakar. Summary ------- Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1. This addresses bug SQOOP-474 . https://issues.apache.org/jira/browse/SQOOP-474 Diffs ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1308530 Diff: https://reviews.apache.org/r/4614/diff Testing ------- Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery). Thanks, Kathleen
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4614/#review6664
        -----------------------------------------------------------

        Hi Kate, your patch looks good. I only want to mention that I made some change to the same area of code at SQOOP-468, which Jarec is going to submit soon. So you will need to rebase your patch once it is committed.

        In my patch, I factored out that area of code into a separate function (buildBoundaryQuery). To achieve what you're doing here, you can change the following line:

        private String buildBoundaryQuery(String col, String query) {
        if (col == null) { // change to --> if (col == null || options.getNumMappers() == 1)

        { return ""; }

        ...
        }

        I have tested this in my workspace by myself and seen no issues. Please let me know if you have any concerns/questions.

        • Cheolsoo

        On 2012-04-02 22:23:54, Kathleen Ting wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4614/

        -----------------------------------------------------------

        (Updated 2012-04-02 22:23:54)

        Review request for Sqoop and Arvind Prabhakar.

        Summary

        -------

        Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1.

        This addresses bug SQOOP-474.

        https://issues.apache.org/jira/browse/SQOOP-474

        Diffs

        -----

        ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1308530

        Diff: https://reviews.apache.org/r/4614/diff

        Testing

        -------

        Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery).

        Thanks,

        Kathleen

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4614/#review6664 ----------------------------------------------------------- Hi Kate, your patch looks good. I only want to mention that I made some change to the same area of code at SQOOP-468 , which Jarec is going to submit soon. So you will need to rebase your patch once it is committed. In my patch, I factored out that area of code into a separate function (buildBoundaryQuery). To achieve what you're doing here, you can change the following line: private String buildBoundaryQuery(String col, String query) { if (col == null) { // change to --> if (col == null || options.getNumMappers() == 1) { return ""; } ... } I have tested this in my workspace by myself and seen no issues. Please let me know if you have any concerns/questions. Cheolsoo On 2012-04-02 22:23:54, Kathleen Ting wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4614/ ----------------------------------------------------------- (Updated 2012-04-02 22:23:54) Review request for Sqoop and Arvind Prabhakar. Summary ------- Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1. This addresses bug SQOOP-474 . https://issues.apache.org/jira/browse/SQOOP-474 Diffs ----- ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1308530 Diff: https://reviews.apache.org/r/4614/diff Testing ------- Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery). Thanks, Kathleen
        Kathleen Ting made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-04-03 20:53:06, Cheolsoo Park wrote:

        > Hi Kate, your patch looks good. I only want to mention that I made some change to the same area of code at SQOOP-468, which Jarec is going to submit soon. So you will need to rebase your patch once it is committed.

        >

        > In my patch, I factored out that area of code into a separate function (buildBoundaryQuery). To achieve what you're doing here, you can change the following line:

        >

        > private String buildBoundaryQuery(String col, String query) {

        > if (col == null) { // change to --> if (col == null || options.getNumMappers() == 1) { bq. > return ""; bq. > }

        > ...

        > }

        >

        > I have tested this in my workspace by myself and seen no issues. Please let me know if you have any concerns/questions.

        Thanks Cheolsoo. I've rebased SQOOP-474 on your SQOOP-468.

        • Kathleen

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4614/#review6664
        -----------------------------------------------------------

        On 2012-04-02 22:23:54, Kathleen Ting wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4614/

        -----------------------------------------------------------

        (Updated 2012-04-02 22:23:54)

        Review request for Sqoop and Arvind Prabhakar.

        Summary

        -------

        Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1.

        This addresses bug SQOOP-474.

        https://issues.apache.org/jira/browse/SQOOP-474

        Diffs

        -----

        ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1308530

        Diff: https://reviews.apache.org/r/4614/diff

        Testing

        -------

        Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery).

        Thanks,

        Kathleen

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-04-03 20:53:06, Cheolsoo Park wrote: > Hi Kate, your patch looks good. I only want to mention that I made some change to the same area of code at SQOOP-468 , which Jarec is going to submit soon. So you will need to rebase your patch once it is committed. > > In my patch, I factored out that area of code into a separate function (buildBoundaryQuery). To achieve what you're doing here, you can change the following line: > > private String buildBoundaryQuery(String col, String query) { > if (col == null) { // change to --> if (col == null || options.getNumMappers() == 1) { bq. > return ""; bq. > } > ... > } > > I have tested this in my workspace by myself and seen no issues. Please let me know if you have any concerns/questions. Thanks Cheolsoo. I've rebased SQOOP-474 on your SQOOP-468 . Kathleen ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4614/#review6664 ----------------------------------------------------------- On 2012-04-02 22:23:54, Kathleen Ting wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4614/ ----------------------------------------------------------- (Updated 2012-04-02 22:23:54) Review request for Sqoop and Arvind Prabhakar. Summary ------- Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1. This addresses bug SQOOP-474 . https://issues.apache.org/jira/browse/SQOOP-474 Diffs ----- ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1308530 Diff: https://reviews.apache.org/r/4614/diff Testing ------- Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery). Thanks, Kathleen
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4614/
        -----------------------------------------------------------

        (Updated 2012-04-05 22:14:12.420696)

        Review request for Sqoop and Arvind Prabhakar.

        Changes
        -------

        Rebased on SQOOP-468

        Summary
        -------

        Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1.

        This addresses bug SQOOP-474.
        https://issues.apache.org/jira/browse/SQOOP-474

        Diffs (updated)


        ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1309506

        Diff: https://reviews.apache.org/r/4614/diff

        Testing
        -------

        Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery).

        Thanks,

        Kathleen

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4614/ ----------------------------------------------------------- (Updated 2012-04-05 22:14:12.420696) Review request for Sqoop and Arvind Prabhakar. Changes ------- Rebased on SQOOP-468 Summary ------- Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1. This addresses bug SQOOP-474 . https://issues.apache.org/jira/browse/SQOOP-474 Diffs (updated) ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1309506 Diff: https://reviews.apache.org/r/4614/diff Testing ------- Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery). Thanks, Kathleen
        Hide
        Kathleen Ting added a comment -

        Rebased on SQOOP-468.

        Show
        Kathleen Ting added a comment - Rebased on SQOOP-468 .
        Kathleen Ting made changes -
        Attachment SQOOP-474-1.patch [ 12521592 ]
        Kathleen Ting made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4614/
        -----------------------------------------------------------

        (Updated 2012-04-05 22:14:27.818185)

        Review request for Sqoop and Bilung Lee.

        Summary
        -------

        Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1.

        This addresses bug SQOOP-474.
        https://issues.apache.org/jira/browse/SQOOP-474

        Diffs


        ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1309506

        Diff: https://reviews.apache.org/r/4614/diff

        Testing
        -------

        Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery).

        Thanks,

        Kathleen

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4614/ ----------------------------------------------------------- (Updated 2012-04-05 22:14:27.818185) Review request for Sqoop and Bilung Lee. Summary ------- Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1. This addresses bug SQOOP-474 . https://issues.apache.org/jira/browse/SQOOP-474 Diffs ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1309506 Diff: https://reviews.apache.org/r/4614/diff Testing ------- Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery). Thanks, Kathleen
        Kathleen Ting made changes -
        Description To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
        {code}
        $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
        {code}

        This import will output the following:
        {code}
        12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE (1 = 1) ) AS t1
        {code}

        The problem is that the bounding value query construction is being triggered because of the --split-by specification. However specifying split-by is redundant given that the number of mappers is 1.
        To reproduce this, run an import using a query with number of mappers set to 1 and a split-by specification. For example:
        {code}
        $ sqoop import --connect jdbc:mysql://localhost/hadoopguide --query 'SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE $CONDITIONS' --split-by AID --target-dir /user/kateting/test1 --m=1
        {code}

        This import will output the following:
        {code}
        12/04/02 13:29:59 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AID), MAX(AID) FROM (SELECT A.*, B.* FROM A JOIN B ON (A.AID = B.BID) WHERE (1 = 1) ) AS t1
        {code}

        An embedded query fails in DB2 when using the 'with ur' syntax. This also fails for Informix if the version of Informix doesn't support embedded queries. The issue is the 'with ur' syntax, without which, the boundary query is harmless. The boundary query is being triggered because of the split-by specification. However specifying split-by is redundant given that the number of mappers is 1.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4614/#review6721
        -----------------------------------------------------------

        Ship it!

        • Bilung

        On 2012-04-05 22:14:27, Kathleen Ting wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4614/

        -----------------------------------------------------------

        (Updated 2012-04-05 22:14:27)

        Review request for Sqoop and Bilung Lee.

        Summary

        -------

        Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1.

        This addresses bug SQOOP-474.

        https://issues.apache.org/jira/browse/SQOOP-474

        Diffs

        -----

        ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1309506

        Diff: https://reviews.apache.org/r/4614/diff

        Testing

        -------

        Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery).

        Thanks,

        Kathleen

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4614/#review6721 ----------------------------------------------------------- Ship it! Bilung On 2012-04-05 22:14:27, Kathleen Ting wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4614/ ----------------------------------------------------------- (Updated 2012-04-05 22:14:27) Review request for Sqoop and Bilung Lee. Summary ------- Before triggering the bounding value query construction, in addition to checking that the user has specified a split by option, also take into account that the number of mappers is 1. This addresses bug SQOOP-474 . https://issues.apache.org/jira/browse/SQOOP-474 Diffs ----- ./src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 1309506 Diff: https://reviews.apache.org/r/4614/diff Testing ------- Ran unit tests. Confirmed that, with the fix, the console output does not have the boundary query in it (i.e. INFO db.DataDrivenDBInputFormat: BoundingValsQuery). Thanks, Kathleen
        Hide
        Bilung Lee added a comment -

        Patch is committed. Thanks, Kathleen!

        Show
        Bilung Lee added a comment - Patch is committed. Thanks, Kathleen!
        Bilung Lee made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Kathleen Ting made changes -
        Fix Version/s 1.4.2-incubating [ 12320141 ]
        Affects Version/s 1.4.1-incubating [ 12318902 ]
        Affects Version/s 1.4.2-incubating [ 12320141 ]
        Hide
        Hudson added a comment -

        Integrated in Sqoop-ant-jdk-1.6 #107 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6/107/)
        SQOOP-474 Split-by specification incorrectly triggers bounding value query (Revision 1310129)

        Result = SUCCESS
        blee :
        Files :

        • /sqoop/trunk/src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java
        Show
        Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6 #107 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6/107/ ) SQOOP-474 Split-by specification incorrectly triggers bounding value query (Revision 1310129) Result = SUCCESS blee : Files : /sqoop/trunk/src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java

          People

          • Assignee:
            Kathleen Ting
            Reporter:
            Kathleen Ting
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development