I have a dataset in parquet in S3 partitioned by date (dt) with oldest date stored in AWS Glacier to save some money. For instance, we have...
I want to read this dataset, but only a subset of date that are not yet in glacier, eg:
Unfortunately, I have the exception
I seems that spark does not like partitioned dataset when some partitions are in Glacier. I could always read specifically each date, add the column with current date and reduce(_ union _) at the end, but not pretty and it should not be necessary.
Is there any tip to read available data in the datastore even with old data in glacier?