Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24370

spark checkpoint creates many 0 byte empty files(partitions) in checkpoint directory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Cannot Reproduce
    • 2.1.1
    • None
    • Spark Shell
    • None

    Description

      We currently facing issue, that when we call checkpoint on dataframe, it creates partitions in checkpoint dir, but some of them are empty. So we having exceptions reading dataframe back.

      Do you have any idea how to avoid it?

      it creates 200 partitions.Some are empty. I used repartition(1) before checkpoint. But it is not good wordaround. Do we have anyway , to populate all partitions with data, or avoid empty files?

      Pasted snapshot.

      Attachments

        1. partitions.PNG
          17 kB
          Jami Malikzade

        Activity

          People

            Unassigned Unassigned
            jimhox Jami Malikzade
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: