Sqoop
  1. Sqoop
  2. SQOOP-824

Sqoop code generation in 'update' export mode incompatible with '--columns' option

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.4.2
    • Fix Version/s: 1.4.3
    • Component/s: codegen, tools
    • Labels:
    • Environment:

      Description

      When using sqoop export with --update-key and --columns the options are incompatible. For example, given table T1

        +------+
        | T1   |
        +------+
        | id   |
        | name |
        | mail |
        | cell |
        +-------
      

      When using:

      $ sqoop export [ .. ] --update-key id --update-mode allowinsert --columns id,name
      

      the code generation utility creates a Java file which fails to compile.

      This seems to happen because ConnManager appends all table columns, even those not mentioned on the command-line (see ConnManager.configureDbOutputColumns). As a result, the generated class contains a write method which references class members which do not exist.

      I could suggest two possible solutions: 1) either make this case an exception or 2) make the code that appends all remaining table columns optional.

      I may be missing something as that code is there for a reason, but I can't see why?

      1. bugSQOOP-824.patch
        5 kB
        Jarek Jarcec Cecho

        Issue Links

          Activity

          Stan Angeloff created issue -
          Stan Angeloff made changes -
          Field Original Value New Value
          Description When using the {{sqoop export}} with {{\-\-update-key}} and {{--columns}} the options are incompatible. For example, given table {{T1}}

          {code}
          +------+
          | T1 |
          +------+
          | id |
          | name |
          | mail |
          | cell |
          +-------
          {code}

          When using {{sqoop export [ .. ] --update-key id --update-mode allowinsert --columns id,name}} the code generation utility creates a Java file which fails to compile.

          This seems to happen because {{ConnManager}} appends all table columns, even those not mentioned on the command-line (see {{ConnManager.configureDbOutputColumns}}. As a result, the generated class contains a {{write}} method which references class members which do not exist.

          I could suggest two possible solutions: 1) either make this case an exception or 2) make the code that appends all remaining table columns optional.

          I may be missing something as that code is there for a reason, but I can't see why?
          When using {{sqoop export}} with {{\-\-update-key}} and {{--columns}} the options are incompatible. For example, given table {{T1}}

          {code}
          +------+
          | T1 |
          +------+
          | id |
          | name |
          | mail |
          | cell |
          +-------
          {code}

          When using {{sqoop export [ .. ] --update-key id --update-mode allowinsert --columns id,name}} the code generation utility creates a Java file which fails to compile.

          This seems to happen because {{ConnManager}} appends all table columns, even those not mentioned on the command-line (see {{ConnManager.configureDbOutputColumns}}. As a result, the generated class contains a {{write}} method which references class members which do not exist.

          I could suggest two possible solutions: 1) either make this case an exception or 2) make the code that appends all remaining table columns optional.

          I may be missing something as that code is there for a reason, but I can't see why?
          Stan Angeloff made changes -
          Description When using {{sqoop export}} with {{\-\-update-key}} and {{--columns}} the options are incompatible. For example, given table {{T1}}

          {code}
          +------+
          | T1 |
          +------+
          | id |
          | name |
          | mail |
          | cell |
          +-------
          {code}

          When using {{sqoop export [ .. ] --update-key id --update-mode allowinsert --columns id,name}} the code generation utility creates a Java file which fails to compile.

          This seems to happen because {{ConnManager}} appends all table columns, even those not mentioned on the command-line (see {{ConnManager.configureDbOutputColumns}}. As a result, the generated class contains a {{write}} method which references class members which do not exist.

          I could suggest two possible solutions: 1) either make this case an exception or 2) make the code that appends all remaining table columns optional.

          I may be missing something as that code is there for a reason, but I can't see why?
          When using {{sqoop export}} with {{\-\-update-key}} and {{--columns}} the options are incompatible. For example, given table {{T1}}

          {code}
            +------+
            | T1 |
            +------+
            | id |
            | name |
            | mail |
            | cell |
            +-------
          {code}

          When using {{sqoop export [ .. ] --update-key id --update-mode allowinsert --columns id,name}} the code generation utility creates a Java file which fails to compile.

          This seems to happen because {{ConnManager}} appends all table columns, even those not mentioned on the command-line (see {{ConnManager.configureDbOutputColumns}}. As a result, the generated class contains a {{write}} method which references class members which do not exist.

          I could suggest two possible solutions: 1) either make this case an exception or 2) make the code that appends all remaining table columns optional.

          I may be missing something as that code is there for a reason, but I can't see why?
          Stan Angeloff made changes -
          Description When using {{sqoop export}} with {{\-\-update-key}} and {{--columns}} the options are incompatible. For example, given table {{T1}}

          {code}
            +------+
            | T1 |
            +------+
            | id |
            | name |
            | mail |
            | cell |
            +-------
          {code}

          When using {{sqoop export [ .. ] --update-key id --update-mode allowinsert --columns id,name}} the code generation utility creates a Java file which fails to compile.

          This seems to happen because {{ConnManager}} appends all table columns, even those not mentioned on the command-line (see {{ConnManager.configureDbOutputColumns}}. As a result, the generated class contains a {{write}} method which references class members which do not exist.

          I could suggest two possible solutions: 1) either make this case an exception or 2) make the code that appends all remaining table columns optional.

          I may be missing something as that code is there for a reason, but I can't see why?
          When using {{sqoop export}} with {{\-\-update-key}} and {{--columns}} the options are incompatible. For example, given table {{T1}}

          {code}
            +------+
            | T1 |
            +------+
            | id |
            | name |
            | mail |
            | cell |
            +-------
          {code}

          When using:

          {code}
          $ sqoop export [ .. ] --update-key id --update-mode allowinsert --columns id,name
          {code}

          the code generation utility creates a Java file which fails to compile.

          This seems to happen because {{ConnManager}} appends all table columns, even those not mentioned on the command-line (see {{ConnManager.configureDbOutputColumns}}. As a result, the generated class contains a {{write}} method which references class members which do not exist.

          I could suggest two possible solutions: 1) either make this case an exception or 2) make the code that appends all remaining table columns optional.

          I may be missing something as that code is there for a reason, but I can't see why?
          Stan Angeloff made changes -
          Description When using {{sqoop export}} with {{\-\-update-key}} and {{--columns}} the options are incompatible. For example, given table {{T1}}

          {code}
            +------+
            | T1 |
            +------+
            | id |
            | name |
            | mail |
            | cell |
            +-------
          {code}

          When using:

          {code}
          $ sqoop export [ .. ] --update-key id --update-mode allowinsert --columns id,name
          {code}

          the code generation utility creates a Java file which fails to compile.

          This seems to happen because {{ConnManager}} appends all table columns, even those not mentioned on the command-line (see {{ConnManager.configureDbOutputColumns}}. As a result, the generated class contains a {{write}} method which references class members which do not exist.

          I could suggest two possible solutions: 1) either make this case an exception or 2) make the code that appends all remaining table columns optional.

          I may be missing something as that code is there for a reason, but I can't see why?
          When using {{sqoop export}} with {{\-\-update-key}} and {{--columns}} the options are incompatible. For example, given table {{T1}}

          {code}
            +------+
            | T1 |
            +------+
            | id |
            | name |
            | mail |
            | cell |
            +-------
          {code}

          When using:

          {code}
          $ sqoop export [ .. ] --update-key id --update-mode allowinsert --columns id,name
          {code}

          the code generation utility creates a Java file which fails to compile.

          This seems to happen because {{ConnManager}} appends all table columns, even those not mentioned on the command-line (see {{ConnManager.configureDbOutputColumns}}). As a result, the generated class contains a {{write}} method which references class members which do not exist.

          I could suggest two possible solutions: 1) either make this case an exception or 2) make the code that appends all remaining table columns optional.

          I may be missing something as that code is there for a reason, but I can't see why?
          Jarek Jarcec Cecho made changes -
          Remote Link This issue links to "Review board (Web Link)" [ 11963 ]
          Jarek Jarcec Cecho made changes -
          Attachment bugSQOOP-824.patch [ 12564163 ]
          Jarek Jarcec Cecho made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Fix Version/s 1.4.3 [ 12322644 ]
          Hide
          Jarek Jarcec Cecho added a comment -

          Hi Stan,
          thank you very much for reporting this issue. It's an excellent catch!

          Jarcec

          Show
          Jarek Jarcec Cecho added a comment - Hi Stan, thank you very much for reporting this issue. It's an excellent catch! Jarcec
          Jarek Jarcec Cecho made changes -
          Assignee Jarek Jarcec Cecho [ jarcec ]
          Hide
          Stan Angeloff added a comment -

          Jarek,

          Wouldn't it be possible to allow this case rather than throw an exception? I had to work on a PostgreSQL UPSERT support recently and I was able to work around this limitation by accepting --columns and --update-keys on the same command. It was trivial, instead of always grabbing all of the table columns, look for user-supplied ones first:

                // User-supplied column names first.
                String [] colNames = options.getColumns();
                if (null == colNames) {
                  // Table-based export. Read column names from table.
                  colNames = getColumnNames(options.getTableName());
                }
          

          The *UpsertOutputFormat classes may need an update since they output a SQL statement according to how the fields have been ordered in configureDbOutputColumns.

          I am happy to share my versions of PostgresqlManager.java and PostgresqlUpsertOutputFormat.java if that would help. I haven't read up on the official contribution guidelines, but a GitHub repository should suffice?

          Stan

          Show
          Stan Angeloff added a comment - Jarek, Wouldn't it be possible to allow this case rather than throw an exception? I had to work on a PostgreSQL UPSERT support recently and I was able to work around this limitation by accepting --columns and --update-keys on the same command. It was trivial, instead of always grabbing all of the table columns, look for user-supplied ones first: // User-supplied column names first. String [] colNames = options.getColumns(); if ( null == colNames) { // Table-based export. Read column names from table. colNames = getColumnNames(options.getTableName()); } The *UpsertOutputFormat classes may need an update since they output a SQL statement according to how the fields have been ordered in configureDbOutputColumns . I am happy to share my versions of PostgresqlManager.java and PostgresqlUpsertOutputFormat.java if that would help. I haven't read up on the official contribution guidelines, but a GitHub repository should suffice? Stan
          Hide
          Jarek Jarcec Cecho added a comment -

          Hi Stan,
          My patch actually allows this functionality, please check out the linked review board or attached patch.

          Jarcec

          Show
          Jarek Jarcec Cecho added a comment - Hi Stan, My patch actually allows this functionality, please check out the linked review board or attached patch. Jarcec
          Hide
          Stan Angeloff added a comment -

          Oh, how odd. I vaguely remember looking at the patch and thought I saw exceptions being thrown, etc. but it's exactly the opposite, apologies for wasting your time.

          Stan

          Show
          Stan Angeloff added a comment - Oh, how odd. I vaguely remember looking at the patch and thought I saw exceptions being thrown, etc. but it's exactly the opposite, apologies for wasting your time. Stan
          Hide
          Abhijeet Gaikwad added a comment -

          --columns is an import option. I think it is semantically incorrect to use it with exports. Though Sqoop does not err out, none of the user guides suggest it can be used with export command. Let me know.

          ~Abhijeet

          Show
          Abhijeet Gaikwad added a comment - --columns is an import option. I think it is semantically incorrect to use it with exports. Though Sqoop does not err out, none of the user guides suggest it can be used with export command. Let me know. ~Abhijeet
          Hide
          Jarek Jarcec Cecho added a comment -

          Hi Abhijeet,
          actually --columns parameter was extended to be supported in Export mode in addition to import in SQOOP-503.

          The reasoning behind SQOOP-503 is quite simple - Sqoop on export checks the table in database and assumes that input data have all the columns. This unfortunately won't be always true - the table might simple change or user might use import with --columns to import only column subset. In such case, user can't easily get the data back to database.

          You might imagine following use case - users are importing data to hadoop from MySQL for backups. On demand they are moving this data back to MySQL for ad-hoc analysis. In case that they've added new column to their table, this process will not be straightforward as user might need extra step for some temporary tables or some other workaround. With --columns parameter, they do not have to do that.

          Jarcec

          Btw - You're right, this usage is not documented and I believe that it's a bug that we should fix.

          Show
          Jarek Jarcec Cecho added a comment - Hi Abhijeet, actually --columns parameter was extended to be supported in Export mode in addition to import in SQOOP-503 . The reasoning behind SQOOP-503 is quite simple - Sqoop on export checks the table in database and assumes that input data have all the columns. This unfortunately won't be always true - the table might simple change or user might use import with --columns to import only column subset. In such case, user can't easily get the data back to database. You might imagine following use case - users are importing data to hadoop from MySQL for backups. On demand they are moving this data back to MySQL for ad-hoc analysis. In case that they've added new column to their table, this process will not be straightforward as user might need extra step for some temporary tables or some other workaround. With --columns parameter, they do not have to do that. Jarcec Btw - You're right, this usage is not documented and I believe that it's a bug that we should fix.
          Hide
          Abhijeet Gaikwad added a comment -

          Thanks Jarcec - that clarifies.

          Show
          Abhijeet Gaikwad added a comment - Thanks Jarcec - that clarifies.
          Hide
          Abhijeet Gaikwad added a comment -

          Thanks Stan and Jarcec for your contributions.

          Show
          Abhijeet Gaikwad added a comment - Thanks Stan and Jarcec for your contributions.
          Abhijeet Gaikwad made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in Sqoop-ant-jdk-1.6-hadoop200 #507 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop200/507/)
          SQOOP-824: Sqoop code generation in 'update' export mode incompatible with '--columns' option (Revision 9a3fd1484f7ff713a323f5687cb9681fc0c524da)

          Result = SUCCESS
          abhijeet : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=9a3fd1484f7ff713a323f5687cb9681fc0c524da
          Files :

          • src/test/com/cloudera/sqoop/TestExportUpdate.java
          • src/java/org/apache/sqoop/manager/ConnManager.java
          Show
          Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6-hadoop200 #507 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop200/507/ ) SQOOP-824 : Sqoop code generation in 'update' export mode incompatible with '--columns' option (Revision 9a3fd1484f7ff713a323f5687cb9681fc0c524da) Result = SUCCESS abhijeet : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=9a3fd1484f7ff713a323f5687cb9681fc0c524da Files : src/test/com/cloudera/sqoop/TestExportUpdate.java src/java/org/apache/sqoop/manager/ConnManager.java
          Hide
          Hudson added a comment -

          Integrated in Sqoop-ant-jdk-1.6-hadoop23 #712 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop23/712/)
          SQOOP-824: Sqoop code generation in 'update' export mode incompatible with '--columns' option (Revision 9a3fd1484f7ff713a323f5687cb9681fc0c524da)

          Result = SUCCESS
          abhijeet : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=9a3fd1484f7ff713a323f5687cb9681fc0c524da
          Files :

          • src/test/com/cloudera/sqoop/TestExportUpdate.java
          • src/java/org/apache/sqoop/manager/ConnManager.java
          Show
          Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6-hadoop23 #712 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop23/712/ ) SQOOP-824 : Sqoop code generation in 'update' export mode incompatible with '--columns' option (Revision 9a3fd1484f7ff713a323f5687cb9681fc0c524da) Result = SUCCESS abhijeet : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=9a3fd1484f7ff713a323f5687cb9681fc0c524da Files : src/test/com/cloudera/sqoop/TestExportUpdate.java src/java/org/apache/sqoop/manager/ConnManager.java
          Hide
          Hudson added a comment -

          Integrated in Sqoop-ant-jdk-1.6-hadoop20 #517 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop20/517/)
          SQOOP-824: Sqoop code generation in 'update' export mode incompatible with '--columns' option (Revision 9a3fd1484f7ff713a323f5687cb9681fc0c524da)

          Result = SUCCESS
          abhijeet : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=9a3fd1484f7ff713a323f5687cb9681fc0c524da
          Files :

          • src/java/org/apache/sqoop/manager/ConnManager.java
          • src/test/com/cloudera/sqoop/TestExportUpdate.java
          Show
          Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6-hadoop20 #517 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop20/517/ ) SQOOP-824 : Sqoop code generation in 'update' export mode incompatible with '--columns' option (Revision 9a3fd1484f7ff713a323f5687cb9681fc0c524da) Result = SUCCESS abhijeet : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=9a3fd1484f7ff713a323f5687cb9681fc0c524da Files : src/java/org/apache/sqoop/manager/ConnManager.java src/test/com/cloudera/sqoop/TestExportUpdate.java
          Hide
          Hudson added a comment -

          Integrated in Sqoop-ant-jdk-1.6-hadoop100 #504 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop100/504/)
          SQOOP-824: Sqoop code generation in 'update' export mode incompatible with '--columns' option (Revision 9a3fd1484f7ff713a323f5687cb9681fc0c524da)

          Result = SUCCESS
          abhijeet : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=9a3fd1484f7ff713a323f5687cb9681fc0c524da
          Files :

          • src/java/org/apache/sqoop/manager/ConnManager.java
          • src/test/com/cloudera/sqoop/TestExportUpdate.java
          Show
          Hudson added a comment - Integrated in Sqoop-ant-jdk-1.6-hadoop100 #504 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop100/504/ ) SQOOP-824 : Sqoop code generation in 'update' export mode incompatible with '--columns' option (Revision 9a3fd1484f7ff713a323f5687cb9681fc0c524da) Result = SUCCESS abhijeet : https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=9a3fd1484f7ff713a323f5687cb9681fc0c524da Files : src/java/org/apache/sqoop/manager/ConnManager.java src/test/com/cloudera/sqoop/TestExportUpdate.java
          Hide
          Ankit added a comment -

          sqoop-export
          --connect jdbc:postgresql://<ip>/<database_name>
          --connection-manager com.cloudera.sqoop.manager.PostgresqlManager
          --username <name>
          --password <pass>
          --table <table_name>
          --update-key 'primary_key'
          --columns '<primary_key_column>,<col_1_tobeupdated>,<col_2_tobeupdated>'
          --export-dir <export-director>
          -input-fields-terminated-by ',' -m 1
          – --schema test_data
          ___________
          sample file data :

          871888,60.00
          ___________
          871888 = colmn that goes in where clause
          60.00 = woulbe upated in the column .

          Hope this help i have invested lot of time apache sqoop dont have a proper documentation so i am sharing that it could help other .

          cheers .

          Show
          Ankit added a comment - sqoop-export --connect jdbc:postgresql://<ip>/<database_name> --connection-manager com.cloudera.sqoop.manager.PostgresqlManager --username <name> --password <pass> --table <table_name> --update-key 'primary_key' --columns '<primary_key_column>,<col_1_tobeupdated>,<col_2_tobeupdated>' --export-dir <export-director> -input-fields-terminated-by ',' -m 1 – --schema test_data ___________ sample file data : 871888,60.00 ___________ 871888 = colmn that goes in where clause 60.00 = woulbe upated in the column . Hope this help i have invested lot of time apache sqoop dont have a proper documentation so i am sharing that it could help other . cheers .
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          4h 3m 1 Jarek Jarcec Cecho 10/Jan/13 13:44
          Patch Available Patch Available Resolved Resolved
          10d 1h 56m 1 Abhijeet Gaikwad 20/Jan/13 15:41

            People

            • Assignee:
              Jarek Jarcec Cecho
              Reporter:
              Stan Angeloff
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development