Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19525

Enable Compression of RDD Checkpoints

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 2.2.0
    • Spark Core
    • None

    Description

      In our testing, compressing partitions while writing them to checkpoints on HDFS using snappy helped performance significantly while also reducing the variability of the checkpointing operation. In our tests, checkpointing time was reduced by 3X, and variability was reduced by 2X for data sets of compressed size approximately 1 GB.

      Attachments

        Activity

          People

            rameshaaditya117 Aaditya Ramesh
            rameshaaditya117 Aaditya Ramesh
            Shixiong Zhu Shixiong Zhu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: