Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.8.0
-
None
-
None
-
CentOS release 5.5 (Final)
Description
Hive Version: Hive 0.8 (last commit SHA b581a6192b8d4c544092679d05f45b2e50d42b45 )
Hadoop version : chd3u0
Trying to use the hive merge small file feature by setting all the necessary params.
Have disabled use of CombineHiveInputFormat since my input is compressed text.
hive> set mapred.min.split.size.per.node=1000000000; hive> set mapred.min.split.size.per.rack=1000000000; hive> set mapred.max.split.size=1000000000; hive> set hive.merge.size.per.task=1000000000; hive> set hive.merge.smallfiles.avgsize=1000000000; hive> set hive.merge.size.smallfiles.avgsize=1000000000; hive> set hive.merge.mapfiles=true; hive> set hive.merge.mapredfiles=true; hive> set hive.mergejob.maponly=false;
The plan decides to launch two MR jobs but after first job succeeds I get runt time error
"java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but reduce operator specified"
How to reproduce :
- Creare tables as follows :
--create input table create table tmp_notmerged ( id int, name string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; --create o/p table create table tmp_merged ( id int ) STORED AS TEXTFILE;
- Load data into tmp_notmerged (find files attached in with this jira)
- set knobs and fire hive query
set hive.merge.mapfiles=true; set hive.mergejob.maponly=false; insert overwrite table tmp_merged select id from tmp_notmerged;
- You should see error "java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but reduce operator specified"
Proposed fix :
Patch is here : https://gist.github.com/2025303