Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Not A Bug
-
1.6.0
-
None
-
None
-
spark 1.6.1
Description
I find that spark sql will create files as many as partition size. When the results are small, there will be too many small files and most of them are empty.
Hive have a function to detect the avg of file size. If avg file size is smaller than "hive.merge.smallfiles.avgsize", hive will add a job to merge files.