Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9932 Data source API improvement (Spark 1.6)
  3. SPARK-10297

When save data to a data source table, we should bound the size of a saved file

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Incomplete
    • None
    • None
    • SQL

    Description

      When we save a table to a data source table, it is possible that a writer is responsible to write out a larger number of rows, which can make the generated file very large and cause job failed if the underlying storage system has a limit of max file size (e.g. S3's limit is 5GB). We should bound the size of a file generated by a writer and create new writers for the same partition if necessary.

      Attachments

        Activity

          People

            Unassigned Unassigned
            yhuai Yin Huai
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: