Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
This is a request to identify a way to support Sqoop import with --hcatalog options when writing Parquet data files. The test case below demonstrates the issue.
CODE SNIP
../MapredParquetOutputFormat.java 69 @Override 70 public RecordWriter<Void, ParquetHiveRecord> getRecordWriter( 71 final FileSystem ignored, 72 final JobConf job, 73 final String name, 74 final Progressable progress 75 ) throws IOException { 76 throw new RuntimeException("Should never be used"); 77 }
TEST CASE:
STEP 01 - Create MySQL Tables sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c_int int, c_date date, c_timestamp timestamp)" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "describe t1" --------------------------------------------------------------------------------------------------------- | Field | Type | Null | Key | Default | Extra | --------------------------------------------------------------------------------------------------------- | c_int | int(11) | YES | | (null) | | | c_date | date | YES | | (null) | | | c_timestamp | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP | --------------------------------------------------------------------------------------------------------- STEP 02 : Insert and Select Row sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, current_date(), current_timestamp())" sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" -------------------------------------------------- | c_int | c_date | c_timestamp | -------------------------------------------------- | 1 | 2016-10-26 | 2016-10-26 14:30:33.0 | -------------------------------------------------- beeline -u jdbc:hive2:// -e "use default; drop table t1" sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --hcatalog-database default --hcatalog-table t1 --create-hcatalog-table --hcatalog-storage-stanza 'stored as parquet' --num-mappers 1 [sqoop console debug] 16/11/02 20:25:15 INFO mapreduce.Job: Task Id : attempt_1478089149450_0046_m_000000_0, Status : FAILED Error: java.lang.RuntimeException: Should never be used at org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getRecordWriter(MapredParquetOutputFormat.java:76) at org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:102) at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:260) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) [yarn maptask debug] 2016-11-02 20:25:15,565 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: 1=1 AND 1=1 2016-11-02 20:25:15,583 DEBUG [main] org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat: Creating db record reader for db product: MYSQL 2016-11-02 20:25:15,613 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1 2016-11-02 20:25:15,614 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 2016-11-02 20:25:15,620 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class 2016-11-02 20:25:15,633 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: Should never be used at org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getRecordWriter(MapredParquetOutputFormat.java:76) at org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:102) at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:260) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Attachments
Issue Links
- is related to
-
SQOOP-3010 Sqoop should not allow --as-parquetfile with hcatalog jobs or when hive import with create-hive-table is used
- Resolved
- relates to
-
SQOOP-3047 Add support for (import + --hive-import + --as-parquet) when Parquet table already exists
- Open