[SPARK-31599] Reading from S3 (Structured Streaming Bucket) Fails after Compaction - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Invalid
Affects Version/s: 2.4.5
Fix Version/s: None
Component/s: SQL, Structured Streaming
Labels:
None

Description

I have a S3 bucket which has data streamed (Parquet format) to it by Spark Structured Streaming Framework from Kafka. Periodically I try to run compaction on this bucket (a separate Spark Job), and on successful compaction delete the non compacted (parquet) files. After which I am getting following error on Spark jobs which read from that bucket:
Caused by: java.io.FileNotFoundException: No such file or directory: s3a://spark-kafka-poc/intermediate/part-00000-05ff7893-8a13-4dcd-aeed-3f0d4b5d1691-c000.gz.parquet

How do we run _c_ompaction on Structured Streaming S3 bucket_s. Also I need to delete the un-compacted files after successful compaction to save space.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Felix Kizhakkel Jose

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 28/Apr/20 19:09

Updated:: 30/Apr/20 12:50

Resolved:: 30/Apr/20 12:42