Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
hive doesn't depend on hadoop job output folder. it produces output exclusively via side effect folders. we should use an outputformat that can request hadoop skip cleanup/setup. this could be nulloutputformat (unless there are any objections in hadoop to changing nulloutputformat behavior).
as a small side effect, it also avoids some totally unnecessary hdfs file creates and deletes in hdfs.