Sqoop
  1. Sqoop
  2. SQOOP-976

Incorrect SQL when incremental criteria is text column

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.3
    • Fix Version/s: 1.4.4
    • Component/s: tools
    • Labels:
      None
    • Environment:

      incremental import on table using text column

    1. sqoop-976-2.diff
      7 kB
      Raghav Kumar Gautam
    2. Fix_for_Sqoop-976.diff
      6 kB
      Raghav Kumar Gautam

      Issue Links

        Activity

        Hide
        Jarek Jarcec Cecho added a comment -

        Hi Waldyn Benbenek,
        would you mind sharing more about your use case? As shared on the email, I'm afraid that using incremental import that is based on text column might not lead to expected results, so I would like to understand what you're trying to achieve before doing any changes.

        Jarcec

        Show
        Jarek Jarcec Cecho added a comment - Hi Waldyn Benbenek , would you mind sharing more about your use case? As shared on the email , I'm afraid that using incremental import that is based on text column might not lead to expected results, so I would like to understand what you're trying to achieve before doing any changes. Jarcec
        Hide
        Waldyn Benbenek added a comment -

        Jarek,

        The use case is a simple one. The user wants to regularly import records from a table. The key column (the only column with consistent data that can be recognized and sorted) is character based. The table I tried on has the column CUSTMERID with the following values:

        CUST1
        CUST2
        CUST3
        CUST4
        .
        .
        CUST8

        This is obviously limited. To make this work in a table with new records being added continuously, one needs a scheme like:

        CUST000000001
        Etc.
        Our customers have key columns that are character based quite often.

        Anyway, the problem is the code which generates the query to specify the range does not take character values into account.

        It is a simple change in ImportTool around line 300

        if (checkColumnType == Types.CHAR)

        { prevEndpoint="'"+prevEndpoint+"'"; nextIncrementalValue = "'"+nextIncrementalValue+"'"; }

        Everything works fine with that and it does not affect other cases.

        Thanks,

        Wally Benbenek

        Show
        Waldyn Benbenek added a comment - Jarek, The use case is a simple one. The user wants to regularly import records from a table. The key column (the only column with consistent data that can be recognized and sorted) is character based. The table I tried on has the column CUSTMERID with the following values: CUST1 CUST2 CUST3 CUST4 . . CUST8 This is obviously limited. To make this work in a table with new records being added continuously, one needs a scheme like: CUST000000001 Etc. Our customers have key columns that are character based quite often. Anyway, the problem is the code which generates the query to specify the range does not take character values into account. It is a simple change in ImportTool around line 300 if (checkColumnType == Types.CHAR) { prevEndpoint="'"+prevEndpoint+"'"; nextIncrementalValue = "'"+nextIncrementalValue+"'"; } Everything works fine with that and it does not affect other cases. Thanks, Wally Benbenek
        Hide
        Jarek Jarcec Cecho added a comment -

        Hi Waldyn Benbenek,
        thank you very much for your feedback. I do understand the simplicity of the fix. My concern is that I'm not entirely convinced that we should allow this in the first place.

        As you've mentioned using scheme like CUST1 ... CUST8 will fail as soon as someone will accidentally insert CUST11. The database itself won't prevent anyone from doing so and Sqoop will never see this row as newly inserted. Which at the end might lead to a data corruption. Also comparing strings is not efficient from the database perspective and it's actually quite dangerous due to various encoding that can be applied (and especially when they are changed). We've experienced a lot of troubles with TextSplitter in the past, so I'm afraid that by allowing using string based columns for incremental updates, we're just opening another Pandora box.

        Jarcec

        Show
        Jarek Jarcec Cecho added a comment - Hi Waldyn Benbenek , thank you very much for your feedback. I do understand the simplicity of the fix. My concern is that I'm not entirely convinced that we should allow this in the first place. As you've mentioned using scheme like CUST1 ... CUST8 will fail as soon as someone will accidentally insert CUST11 . The database itself won't prevent anyone from doing so and Sqoop will never see this row as newly inserted. Which at the end might lead to a data corruption. Also comparing strings is not efficient from the database perspective and it's actually quite dangerous due to various encoding that can be applied (and especially when they are changed). We've experienced a lot of troubles with TextSplitter in the past, so I'm afraid that by allowing using string based columns for incremental updates, we're just opening another Pandora box. Jarcec
        Hide
        Jarek Jarcec Cecho added a comment -

        More discussion about this jira is available in dev mailing list. It seems that we have an agreement that we should fail quickly rather than generating incorrect query and letting users deal with random SQL Exceptions.

        Show
        Jarek Jarcec Cecho added a comment - More discussion about this jira is available in dev mailing list . It seems that we have an agreement that we should fail quickly rather than generating incorrect query and letting users deal with random SQL Exceptions.
        Hide
        Raghav Kumar Gautam added a comment -

        Attaching a fix for this JIRA.

        Show
        Raghav Kumar Gautam added a comment - Attaching a fix for this JIRA.
        Hide
        Raghav Kumar Gautam added a comment -

        Patch submitted - see attachment.

        Show
        Raghav Kumar Gautam added a comment - Patch submitted - see attachment.
        Hide
        Jarek Jarcec Cecho added a comment -

        Hi Raghav Kumar Gautam,
        thank you very much for taking this one! Would you mind uploading your patch to the review board?

        Jarcec

        Show
        Jarek Jarcec Cecho added a comment - Hi Raghav Kumar Gautam , thank you very much for taking this one! Would you mind uploading your patch to the review board ? Jarcec
        Hide
        Raghav Kumar Gautam added a comment -
        Show
        Raghav Kumar Gautam added a comment - It is up at https://reviews.apache.org/r/11524/ .
        Hide
        Raghav Kumar Gautam added a comment -

        Uploading new patch - with changes suggested by Jarek.

        Show
        Raghav Kumar Gautam added a comment - Uploading new patch - with changes suggested by Jarek.
        Hide
        ASF subversion and git services added a comment -

        Commit 658c15848f466e020975d91114120900dd0f9cbf in branch refs/heads/trunk from Jarek Jarcec Cecho
        [ https://git-wip-us.apache.org/repos/asf?p=sqoop.git;h=658c158 ]

        SQOOP-976: Incorrect SQL when incremental criteria is text column

        (Raghav Kumar Gautam via Jarek Jarcec Cecho)

        Show
        ASF subversion and git services added a comment - Commit 658c15848f466e020975d91114120900dd0f9cbf in branch refs/heads/trunk from Jarek Jarcec Cecho [ https://git-wip-us.apache.org/repos/asf?p=sqoop.git;h=658c158 ] SQOOP-976 : Incorrect SQL when incremental criteria is text column (Raghav Kumar Gautam via Jarek Jarcec Cecho)
        Hide
        Jarek Jarcec Cecho added a comment -

        Thank you Raghav for your contribution!

        Show
        Jarek Jarcec Cecho added a comment - Thank you Raghav for your contribution!
        Hide
        Hudson added a comment -

        Integrated in Sqoop-ant-jdk-1.6-hadoop20 #672 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop20/672/)
        SQOOP-976: Incorrect SQL when incremental criteria is text column (Revision 658c15848f466e020975d91114120900dd0f9cbf)

        Result = SUCCESS
        jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=658c15848f466e020975d91114120900dd0f9cbf
        Files :

        • src/java/org/apache/sqoop/manager/ConnManager.java
        • src/java/org/apache/sqoop/tool/ImportTool.java
        • src/docs/user/import.txt
        • src/test/com/cloudera/sqoop/TestIncrementalImport.java
        Show
        Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6-hadoop20 #672 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop20/672/ ) SQOOP-976 : Incorrect SQL when incremental criteria is text column (Revision 658c15848f466e020975d91114120900dd0f9cbf) Result = SUCCESS jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=658c15848f466e020975d91114120900dd0f9cbf Files : src/java/org/apache/sqoop/manager/ConnManager.java src/java/org/apache/sqoop/tool/ImportTool.java src/docs/user/import.txt src/test/com/cloudera/sqoop/TestIncrementalImport.java
        Hide
        Hudson added a comment -

        Integrated in Sqoop-ant-jdk-1.6-hadoop100 #664 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop100/664/)
        SQOOP-976: Incorrect SQL when incremental criteria is text column (Revision 658c15848f466e020975d91114120900dd0f9cbf)

        Result = SUCCESS
        jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=658c15848f466e020975d91114120900dd0f9cbf
        Files :

        • src/docs/user/import.txt
        • src/test/com/cloudera/sqoop/TestIncrementalImport.java
        • src/java/org/apache/sqoop/manager/ConnManager.java
        • src/java/org/apache/sqoop/tool/ImportTool.java
        Show
        Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6-hadoop100 #664 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop100/664/ ) SQOOP-976 : Incorrect SQL when incremental criteria is text column (Revision 658c15848f466e020975d91114120900dd0f9cbf) Result = SUCCESS jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=658c15848f466e020975d91114120900dd0f9cbf Files : src/docs/user/import.txt src/test/com/cloudera/sqoop/TestIncrementalImport.java src/java/org/apache/sqoop/manager/ConnManager.java src/java/org/apache/sqoop/tool/ImportTool.java
        Hide
        Hudson added a comment -

        Integrated in Sqoop-ant-jdk-1.6-hadoop200 #675 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop200/675/)
        SQOOP-976: Incorrect SQL when incremental criteria is text column (Revision 658c15848f466e020975d91114120900dd0f9cbf)

        Result = SUCCESS
        jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=658c15848f466e020975d91114120900dd0f9cbf
        Files :

        • src/java/org/apache/sqoop/tool/ImportTool.java
        • src/docs/user/import.txt
        • src/test/com/cloudera/sqoop/TestIncrementalImport.java
        • src/java/org/apache/sqoop/manager/ConnManager.java
        Show
        Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6-hadoop200 #675 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop200/675/ ) SQOOP-976 : Incorrect SQL when incremental criteria is text column (Revision 658c15848f466e020975d91114120900dd0f9cbf) Result = SUCCESS jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=658c15848f466e020975d91114120900dd0f9cbf Files : src/java/org/apache/sqoop/tool/ImportTool.java src/docs/user/import.txt src/test/com/cloudera/sqoop/TestIncrementalImport.java src/java/org/apache/sqoop/manager/ConnManager.java
        Hide
        Hudson added a comment -

        Integrated in Sqoop-ant-jdk-1.6-hadoop23 #866 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop23/866/)
        SQOOP-976: Incorrect SQL when incremental criteria is text column (Revision 658c15848f466e020975d91114120900dd0f9cbf)

        Result = SUCCESS
        jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=658c15848f466e020975d91114120900dd0f9cbf
        Files :

        • src/test/com/cloudera/sqoop/TestIncrementalImport.java
        • src/java/org/apache/sqoop/tool/ImportTool.java
        • src/java/org/apache/sqoop/manager/ConnManager.java
        • src/docs/user/import.txt
        Show
        Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6-hadoop23 #866 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop23/866/ ) SQOOP-976 : Incorrect SQL when incremental criteria is text column (Revision 658c15848f466e020975d91114120900dd0f9cbf) Result = SUCCESS jarcec : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=658c15848f466e020975d91114120900dd0f9cbf Files : src/test/com/cloudera/sqoop/TestIncrementalImport.java src/java/org/apache/sqoop/tool/ImportTool.java src/java/org/apache/sqoop/manager/ConnManager.java src/docs/user/import.txt

          People

          • Assignee:
            Raghav Kumar Gautam
            Reporter:
            Waldyn Benbenek
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development