[SPARK-16188] Spark sql create a lot of small files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Not A Bug
Affects Version/s: 1.6.0
Fix Version/s: None
Component/s: SQL
Labels:
None
Environment:

spark 1.6.1

Description

I find that spark sql will create files as many as partition size. When the results are small, there will be too many small files and most of them are empty.

Hive have a function to detect the avg of file size. If avg file size is smaller than "hive.merge.smallfiles.avgsize", hive will add a job to merge files.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: cen yuhai

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 24/Jun/16 08:41

Updated:: 12/Dec/22 18:11

Resolved:: 29/Jun/16 00:55