Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31599

Reading from S3 (Structured Streaming Bucket) Fails after Compaction

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 2.4.5
    • None
    • SQL, Structured Streaming
    • None

    Description

      I have a S3 bucket which has data streamed (Parquet format) to it by Spark Structured Streaming Framework from Kafka. Periodically I try to run compaction on this bucket (a separate Spark Job), and on successful compaction delete the non compacted (parquet) files. After which I am getting following error on Spark jobs which read from that bucket:
      Caused by: java.io.FileNotFoundException: No such file or directory: s3a://spark-kafka-poc/intermediate/part-00000-05ff7893-8a13-4dcd-aeed-3f0d4b5d1691-c000.gz.parquet

      How do we run _c_ompaction on Structured Streaming S3 bucket_s. Also I need to delete the un-compacted files after successful compaction to save space.

      Attachments

        Activity

          People

            Unassigned Unassigned
            FelixKJose Felix Kizhakkel Jose
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: