Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-3207

Codegen is using column labels rather than column names

Add voteWatch issue
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 1.4.5
    • None
    • codegen
      • Red Hat Enterprise Linux Server release 6.2 (Santiago)
      • Teradata 14.10.07.21
      • Sqoop 1.4.6
    • Patch

    Description

      We are using codegen to create external Hive tables.

      Sqoop 1.4.4 was using getColumnName() in SqlManager.java and ResultSetPrinter.java. It looks like SQOOP-585 does the opposite, using getColumnLabel() first.

      The issue can be seen when column labels (aliases) contain spaces, which is often the case for Teradata sources. When running codegen, the resulting Hive create statement will persist the spaces in the column labels as column names. The desired behavior is to use the column names instead.

      Example:

      CREATE TABLE IF NOT EXISTS `null` ( `FIRST NAME` STRING, `LAST NAME` STRING) COMMENT 'Imported by sqoop on 2017/06/28 14:15:35' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE;
      

      Sqoop 1.4.4 Example:

      CREATE TABLE IF NOT EXISTS `null` ( `FIRST_NAME` STRING, `LAST_NAME` STRING) COMMENT 'Imported by sqoop on 2017/06/28 14:15:35' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE;
      

      Codegen example:

      sqoop codegen -libjars ../lib/tdgssconfig-1.5.1.jar --driver com.teradata.jdbc.TeraDriver --connect jdbc:teradata://my_hostname/DATABASE=my_db --username my_username --password my_password --outdir /users/raphnguyen/my_table/src_generated --bindir /users/raphnguyen/my_table/jar_generated --query "SELECT * FROM MY_DB.MY_TABLE WHERE \$CONDITIONS" --hive-import
      

      Related: SQOOP-585

      Attachments

        Activity

          People

            Unassigned Unassigned
            raphnguyen Raphael Nguyen

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified

                Issue deployment