Uploaded image for project: 'Sqoop'
  1. Sqoop
  2. SQOOP-3046

Add support for (import + --hcatalog* + --as-parquetfile)

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: hive-integration
    • Labels:
      None

      Description

      This is a request to identify a way to support Sqoop import with --hcatalog options when writing Parquet data files. The test case below demonstrates the issue.

      CODE SNIP

      ../MapredParquetOutputFormat.java	
      69  @Override
      70  public RecordWriter<Void, ParquetHiveRecord> getRecordWriter(
      71      final FileSystem ignored,
      72      final JobConf job,
      73      final String name,
      74      final Progressable progress
      75      ) throws IOException {
      76    throw new RuntimeException("Should never be used");
      77  }
      

      TEST CASE:

      STEP 01 - Create MySQL Tables
      
      sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1"
      sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c_int int, c_date date, c_timestamp timestamp)"
      sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "describe t1"
      ---------------------------------------------------------------------------------------------------------
      | Field                | Type                 | Null | Key | Default              | Extra                | 
      ---------------------------------------------------------------------------------------------------------
      | c_int                | int(11)              | YES |     | (null)               |                      | 
      | c_date               | date                 | YES |     | (null)               |                      | 
      | c_timestamp          | timestamp            | NO  |     | CURRENT_TIMESTAMP    | on update CURRENT_TIMESTAMP | 
      ---------------------------------------------------------------------------------------------------------
      
      STEP 02 : Insert and Select Row
      
      sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, current_date(), current_timestamp())"
      sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1"
      --------------------------------------------------
      | c_int       | c_date     | c_timestamp         | 
      --------------------------------------------------
      | 1           | 2016-10-26 | 2016-10-26 14:30:33.0 | 
      --------------------------------------------------
      
      beeline -u jdbc:hive2:// -e "use default; drop table t1"
      sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --hcatalog-database default --hcatalog-table t1 --create-hcatalog-table --hcatalog-storage-stanza 'stored as parquet' --num-mappers 1
      
      [sqoop console debug]
      16/11/02 20:25:15 INFO mapreduce.Job: Task Id : attempt_1478089149450_0046_m_000000_0, Status : FAILED
      Error: java.lang.RuntimeException: Should never be used
      	at org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getRecordWriter(MapredParquetOutputFormat.java:76)
      	at org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:102)
      	at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:260)
      	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
      	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714)
      	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
      
      [yarn maptask debug]	
      2016-11-02 20:25:15,565 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: 1=1 AND 1=1
      2016-11-02 20:25:15,583 DEBUG [main] org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat: Creating db record reader for db product: MYSQL
      2016-11-02 20:25:15,613 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
      2016-11-02 20:25:15,614 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
      2016-11-02 20:25:15,620 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
      2016-11-02 20:25:15,633 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: Should never be used
      	at org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getRecordWriter(MapredParquetOutputFormat.java:76)
      	at org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:102)
      	at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:260)
      	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
      	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714)
      	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                markuskemper@me.com Markus Kemper
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: