Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Besides the main payload, the majority of connectors (and also many formats) expose additional information that should be readable and (depending on the use case) also writable as metadata.
It can simply be read-only metadata such as a Kafka read-offset or ingestion time. But can also add or remove header information (e.g. a message hash, or record version) to every Kafka ProducerRecord. Additionally, users might want to read and write only parts of the record that contain data but additionally serve different purposes (e.g. compaction by key).
We should make it possible to read and write data from all of those locations.
Kafka is the source with the most intricacies as it allows storing data in multiple different places of the records. Each of those places is/can be serialized differently. Moreover some of them might serve different purposes:
- all of them can be just a data container,
- key for partitioning (hash on the key),
- key for compacting (if topic is compacted records with same key within a partition are merged),
- timestamp for log retention
- header for metadata
Also formats should be able to expose metadata, FLIP-132 is just one example where the Debezium format might expose a "db_operation_time" that is not part of the schema itself.
Other use cases could be exposing Avro version or Avro schema as meta information per record.
See FLIP-107 for more information:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors
Attachments
Issue Links
- links to