Details
-
Bug
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
1.0.0
-
None
-
None
Description
No idea why hbase_handler_bulk.q does not catch this if its being run regularly in Hive builds, but here's the gist of the issue:
The condition at https://github.com/apache/hive/blob/master/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java#L152-L164 indicates that we will infinitely loop until we find a file whose last path component (the name) is equal to the column family name.
In execution, however, the iteration enters an actual infinite loop cause the file we end up considering as the srcDir name, is actually the region file, whose name will never match the family name.
This is an example of the IPC the listing loop of a 100% progress task gets stuck in:
2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 1: Call -> cdh54.vm/172.16.29.132:8020: getListing {src: "/user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_000000_0/family/97112ac1c09548ae87bd85af072d2e8c" startAfter: "" needLocation: false} 2015-07-21 10:32:20,662 DEBUG [IPC Parameter Sending Thread #1] org.apache.hadoop.ipc.Client: IPC Client (1551465414) connection to cdh54.vm/172.16.29.132:8020 from hive sending #510346 2015-07-21 10:32:20,662 DEBUG [IPC Client (1551465414) connection to cdh54.vm/172.16.29.132:8020 from hive] org.apache.hadoop.ipc.Client: IPC Client (1551465414) connection to cdh54.vm/172.16.29.132:8020 from hive got value #510346 2015-07-21 10:32:20,662 DEBUG [main] org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getListing took 0ms 2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 1: Response <- cdh54.vm/172.16.29.132:8020: getListing {dirList { partialListing { fileType: IS_FILE path: "" length: 863 permission { perm: 4600 } owner: "hive" group: "hive" modification_time: 1437454718130 access_time: 1437454717973 block_replication: 1 blocksize: 134217728 fileId: 33960 childrenNum: 0 storagePolicy: 0 } remainingEntries: 0 }}
The path we are getting out of the listing results is /user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_000000_0/family/97112ac1c09548ae87bd85af072d2e8c, but instead of checking the path's parent family we're instead looping infinitely over its hashed filename 97112ac1c09548ae87bd85af072d2e8c cause it does not match family.
It stays in the infinite loop therefore, until the MR framework kills it away due to an idle task timeout (and then since the subsequent task attempts fail outright, the job fails).
While doing a getPath().getParent() will resolve that, is that infinite loop even necessary? Especially given the fact that we throw exceptions if there are no entries or there is more than one entry.