When a kinesis shard is split or combined and the old shard ends, the Amazon Kinesis Client library calls IRecordProcessor.shutdown and expects that IRecordProcessor.shutdown must checkpoint the sequence number ExtendedSequenceNumber.SHARD_END before returning. Unfortunately, spark’s KinesisRecordProcessor sometimes does not checkpoint SHARD_END. This results in an error message, and spark is then blocked indefinitely from processing any items from the child shards.
This issue has also been raised on StackOverflow: resharding while spark running on kinesis stream
Exception that is logged:
Command used to split shard:
After the spark-streaming job has hung, examining the DynamoDB table indicates that the parent shard processor has not reached ExtendedSequenceNumber.SHARD_END and the child shards are still at ExtendedSequenceNumber.TRIM_HORIZON waiting for the parent to finish:
Workaround: I manually edited the DynamoDB table to delete the checkpoints for the parent shards. The child shards were then able to begin processing. I’m not sure whether this resulted in a few items being lost though.