Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
1.1.0
-
Built into CDH 5.4.7
Description
When creating HFiles for a bulkload in HBase I believe it is looking in the wrong directory to find the HFiles, resulting in the following exception:
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: java.io.IOException: Multiple family directories found in hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Multiple family directories found in hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:188) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:958) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287) ... 7 more Caused by: java.io.IOException: Multiple family directories found in hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary at org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$1.close(HiveHFileOutputFormat.java:158) at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:185) ... 11 more
The issue is that is looks for the HFiles in hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary when I believe it should be looking in the task attempt subfolder, such as hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary/attempt_1461004169450_0002_r_000000_1000.
This can be reproduced in any HFile creation such as:
CREATE TABLE coords_hbase(id INT, x DOUBLE, y DOUBLE) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping' = ':key,o:x,o:y', 'hbase.table.default.storage.type' = 'binary'); SET hfile.family.path=/tmp/coords_hfiles/o; SET hive.hbase.generatehfiles=true; INSERT OVERWRITE TABLE coords_hbase SELECT id, decimalLongitude, decimalLatitude FROM source CLUSTER BY id;
Any advice greatly appreciated
Attachments
Attachments
Issue Links
- relates to
-
HIVE-15341 Get work path instead of attempted task path in HiveHFileOutputFormat
- Resolved
- links to