This is a tracking JIRA for change data capture (CDC) support in Kudu.
This feature is not trivial and would need a significant amount of design work.
Here are some high-level considerations and potential requirements / approaches:
- New CDC Subscription APIs (remote and client)
- Support for distributed consumers (necessary if subscribing to all cluster changes at scale)
- Consensus support for permanent non-voters (in order to have a replication target)
- Ability to retain WAL segments if a CDC subscriber has not yet caught up
- Ability to re-sync subscribers that have fallen behind the leader's WAL. Required to support a new subscriber joining after the cluster has been running for a long time.
In order to support re-syncing subscribers that have fallen behind the leader's WAL, we may be able to build a bridge API for a consumer to ingest the results of a tablet copy operation, without having to be a native Kudu tablet server itself.