Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
1.1.0, 1.2.0
-
None
-
None
Description
Two tables, hbasetable_risk_control_defense_idx_uid is HBase mapped table:
[root@dev01 ~]# hadoop fs -du -s -h /hbase/data/tandem/hbase-table-risk-control-defense-idx-uid 3.0 G 9.0 G /hbase/data/tandem/hbase-table-risk-control-defense-idx-uid [root@dev01 ~]# hadoop fs -du -s -h /user/hive/warehouse/openapi_invoke_base 6.6 G 19.7 G /user/hive/warehouse/openapi_invoke_base
The smallest table is 3.0G, is greater than hive.mapjoin.smalltable.filesize and hive.auto.convert.join.noconditionaltask.size. When join these tables, Hive auto convert it to mapjoin:
hive> select count(*) from hbasetable_risk_control_defense_idx_uid t1 join openapi_invoke_base t2 on (t1.key=t2.merchantid); Query ID = root_20160628092222_9f9d3f25-857b-412c-8a75-3d9228bd5ee5 Total jobs = 1 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 Execution log at: /tmp/root/root_20160628092222_9f9d3f25-857b-412c-8a75-3d9228bd5ee5.log 2016-06-28 09:22:10 Starting to launch local task to process map join; maximum memory = 1908932608
the root cause is hive use /user/hive/warehouse/hbasetable_risk_control_defense_idx_uid as it location, but it empty. so hive auto convert it to mapjoin.
My opinion is set right location when mapping HBase table.