We encountered a real-world case Spark fails the query if some of the partitions don't have matching offset by timestamp.
This is intended behavior to avoid bring unintended output for some cases like:
- timestamp 2 is presented as timestamp-offset, but the some of partitions don't have the record yet
- record with timestamp 1 comes "later" in the following micro-batch
which is possible since Kafka allows to specify the timestamp in record.
Here the unintended output we talked about was the risk of reading record with timestamp 1 in the next micro-batch despite the option specifying timestamp 2.
But for many cases end users just suppose timestamp is increasing monotonically, and current behavior blocks these cases to make progress.
For the cases the timestamp is supposed to increase monotonically, it's safe to consider the offset to be latest (technically, offset for latest record + 1) if there's no matching record via timestamp.
This would be pretty much helpful for the case where there's a skew between partitions and some partitions have older records.
- AS-IS: Spark simply fails the query and end users have to deal with workarounds requiring manual steps.
- TO-BE: Spark will assign the latest offset for these partitions, so that Spark can read newer records from these partitions in further micro-batches.
To retain the existing behavior and also give some help for the proposed "TO-BE" behavior, we'd like to introduce the strategy on mismatched offset for start offset timestamp.