Details
-
Improvement
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
At the moment, ConsumeGCPubSub will generate one FlowFile per consumed message (the Batch Size property is only specifying the maximum number of messages we may pull from the subscription with one API call). This can be extremely inefficient.
Similarly to the Kafka processors, we should add the option to have multiple Processing Strategies:
- Flow File - which is the current behavior - where one message is one FlowFile and FlowFile attributes will be used to store the attributes associated with the message as well as some information such as message ID, ack ID, etc.
- Demarcator - where messages will be appended into a single FlowFile with a custom demarcator between each message. In this case specific attributes associated to messages will be lost. This however is the most efficient strategy if very high throughput is required and message format is allowing this approach.
- Record - where a reader and writer can be specified to process the messages. This is useful if we want to change message format on the fly or if the message format is not allowing the demarcator strategy. In addition, an output strategy is available with two allowable values:
- Value - messages are all added in the same flowfile with the specified writer. In this case specific attributes associated to messages will be lost.
- Wrapper - in this case, we are overriding the schema of the writer to include the metadata of the message as well as a map of its attributes.
Attachments
Issue Links
- links to