Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Delta Streamer is great for incremental workloads, but we need to support backfills for use cases like adding a new column and backfill only that column for the last 6 months, and if there was a bug in our transformation logic and we need to reprocess a couple of older partitions.
If we have a SqlSource as one of the input source to the delta streamer, then I can pass any custom Spark SQL queries selecting specific partitions and backfill.
When we do the backfill, we don't need to update the last processed commit checkpoint, this has to copy the last processed checkpoint before the backfill and copy that over to the backfill commit.
cc nishith29
Attachments
Issue Links
- links to