Index: conf/hive-default.xml =================================================================== --- conf/hive-default.xml (revision 905409) +++ conf/hive-default.xml (working copy) @@ -386,7 +386,7 @@ hive.merge.mapredfiles false - Merge small files at the end of any job(map only or map-reduce) + Merge small files at the end of a map-reduce job @@ -397,11 +397,17 @@ hive.merge.size.per.task - 256000000 + 32000000 Size of merged files at the end of the job + hive.merge.size.smallfiles.avgsize + 16000000 + When the average output file size of a job is less than this number, Hive will start an additional map-reduce job to merge the output files into bigger files. This is only done for map-only jobs if hive.merge.mapfiles is true, and for map-reduce jobs if hive.merge.mapredfiles is true. + + + hive.script.auto.progress false Whether Hive Tranform/Map/Reduce Clause should automatically send progress information to TaskTracker to avoid the task getting killed because of inactivity. Hive sends progress information when the script is outputting to stderr. This option removes the need of periodically producing stderr messages, but users should be cautious because this may prevent infinite loops in the scripts to be killed by TaskTracker. Index: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java =================================================================== --- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java (revision 905409) +++ common/src/java/org/apache/hadoop/hive/conf/HiveConf.java (working copy) @@ -167,7 +167,7 @@ HIVEMERGEMAPFILES("hive.merge.mapfiles", true), HIVEMERGEMAPREDFILES("hive.merge.mapredfiles", false), - HIVEMERGEMAPFILESSIZE("hive.merge.size.per.task", (long)(256*1000*1000)), + HIVEMERGEMAPFILESSIZE("hive.merge.size.per.task", (long)(32*1000*1000)), HIVEMERGEMAPFILESAVGSIZE("hive.merge.smallfiles.avgsize", (long)(16*1000*1000)), HIVESKEWJOIN("hive.optimize.skewjoin", false),