Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16188

Spark sql create a lot of small files

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Not A Bug
    • 1.6.0
    • None
    • SQL
    • None
    • spark 1.6.1

    Description

      I find that spark sql will create files as many as partition size. When the results are small, there will be too many small files and most of them are empty.

      Hive have a function to detect the avg of file size. If avg file size is smaller than "hive.merge.smallfiles.avgsize", hive will add a job to merge files.

      Attachments

        Activity

          People

            Unassigned Unassigned
            cenyuhai cen yuhai
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: