Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Currently, Samza has built-in support to consume from AWS Kinesis, Amazon's messaging service. There have been requests to offer native support for "DynamoDB Streams", which is Amazon's change-capture technology for DynamoDB.
What is DynamoDB Streams?
DynamoDB Streams captures a time-ordered sequence of updates to a DynamoDB table, and stores this information in a log for up to 24 hours. Use-cases include: propagation of table updates, change capture, database replication etc.
How does DynamoDB Streams differ from Kinesis?
While Kinesis is a general-purpose messaging service, DynamoDB Streams is specifically for capturing updates from DynamoDB.
What it takes to make Samza consume from a DynamoDB change-capture stream?
It should be possible to support change-capture from DynamoDB with minimal effort.
As a refresher, the KinesisSystemConsumer in Samza currently creates multiple KinesisWorkers, with each worker processing a single partition in the stream. By default, a Worker internally uses a "KinesisProxy" to consume data from Kinesis. We can configure it read from DynamoDB streams by simply pointing it to use a different proxy. ie, use the DynamoDBProxy instead of the default KinesisProxy when a worker is instantiated.
final Worker worker = StreamsWorkerFactory
.createDynamoDbStreamsWorker(
recordProcessorFactory,
workerConfig,
adapterClient,
amazonDynamoDB,
amazonCloudWatchClient);