Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.0.0
    • Component/s: KafkaConnect
    • Labels:
      None

      Description

      Three didn't make it for the 0.10.2.0 release: Flatten, Cast, and TimestampConverter.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user ewencp opened a pull request:

          https://github.com/apache/kafka/pull/3065

          KAFKA-4714: TimestampConverter transformation (KIP-66)

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/ewencp/kafka kafka-3209-timestamp-converter

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/kafka/pull/3065.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3065


          commit dbb85346d9ca9a09648832ea95c262dfd1021588
          Author: Ewen Cheslack-Postava <me@ewencp.org>
          Date: 2017-01-27T07:26:02Z

          KAFKA-4714: KIP-66: Flatten and Cast single message transforms

          commit 386978ac4b527eb0c80b66cbada5c6da679433b3
          Author: Ewen Cheslack-Postava <me@ewencp.org>
          Date: 2017-01-27T17:58:26Z

          Update list of transformations in documentation class.

          commit 16a836d1142f1a642c3bbced93aaa2ae0dee4b68
          Author: Ewen Cheslack-Postava <me@ewencp.org>
          Date: 2017-01-28T04:53:49Z

          Handle null values for optional fields in Flatten transformation.

          commit ad92662e257d652fad4224b2ac85e4428946734d
          Author: Ewen Cheslack-Postava <me@ewencp.org>
          Date: 2017-05-14T22:47:13Z

          Address review comments and checkstyle issues

          commit 7b234982f99a612b7ca03a088aa8a28b2be8e38f
          Author: Ewen Cheslack-Postava <me@ewencp.org>
          Date: 2017-05-15T02:00:39Z

          Make Flatten transformation handle optionality and default values from ancestors

          commit 9eafd31a8471b96208a1cb3bf6bcd568b15c3839
          Author: Ewen Cheslack-Postava <me@ewencp.org>
          Date: 2017-05-15T17:26:50Z

          KAFKA-4714: KIP-66: TimestampConverter single message transform


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user ewencp opened a pull request: https://github.com/apache/kafka/pull/3065 KAFKA-4714 : TimestampConverter transformation (KIP-66) You can merge this pull request into a Git repository by running: $ git pull https://github.com/ewencp/kafka kafka-3209-timestamp-converter Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3065.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3065 commit dbb85346d9ca9a09648832ea95c262dfd1021588 Author: Ewen Cheslack-Postava <me@ewencp.org> Date: 2017-01-27T07:26:02Z KAFKA-4714 : KIP-66: Flatten and Cast single message transforms commit 386978ac4b527eb0c80b66cbada5c6da679433b3 Author: Ewen Cheslack-Postava <me@ewencp.org> Date: 2017-01-27T17:58:26Z Update list of transformations in documentation class. commit 16a836d1142f1a642c3bbced93aaa2ae0dee4b68 Author: Ewen Cheslack-Postava <me@ewencp.org> Date: 2017-01-28T04:53:49Z Handle null values for optional fields in Flatten transformation. commit ad92662e257d652fad4224b2ac85e4428946734d Author: Ewen Cheslack-Postava <me@ewencp.org> Date: 2017-05-14T22:47:13Z Address review comments and checkstyle issues commit 7b234982f99a612b7ca03a088aa8a28b2be8e38f Author: Ewen Cheslack-Postava <me@ewencp.org> Date: 2017-05-15T02:00:39Z Make Flatten transformation handle optionality and default values from ancestors commit 9eafd31a8471b96208a1cb3bf6bcd568b15c3839 Author: Ewen Cheslack-Postava <me@ewencp.org> Date: 2017-05-15T17:26:50Z KAFKA-4714 : KIP-66: TimestampConverter single message transform
          Hide
          hachikuji Jason Gustafson added a comment -

          Issue resolved by pull request 2458
          https://github.com/apache/kafka/pull/2458

          Show
          hachikuji Jason Gustafson added a comment - Issue resolved by pull request 2458 https://github.com/apache/kafka/pull/2458
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/kafka/pull/2458

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/2458
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/kafka/pull/3065

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/3065
          Hide
          dhananjaydp Dhananjay Patkar added a comment - - edited

          Currently on the sink side, transformations are applied before consuming record.

          For sink connectors, transformations are applied on the collection of SinkRecord before being provided to SinkTask.put().

          Is it possible to get original ConnectRecord object in consumer task?
          If this is not available then ,can we preserve original message or is it possible to add support for post "SinkTask.put()" transformation?

          Show
          dhananjaydp Dhananjay Patkar added a comment - - edited Currently on the sink side, transformations are applied before consuming record. For sink connectors, transformations are applied on the collection of SinkRecord before being provided to SinkTask.put(). Is it possible to get original ConnectRecord object in consumer task? If this is not available then ,can we preserve original message or is it possible to add support for post "SinkTask.put()" transformation?
          Hide
          ewencp Ewen Cheslack-Postava added a comment -

          Dhananjay Patkar The record is maintained internally, but it is not exposed to the connector. It's somewhat counter to the point of SMTs since in some cases you may be trying to do things like remove PII, in which case you definitely don't want to give the connector task a chance to see that data.

          Regarding support for a transformation after SinkTask.put(), that doesn't really make sense since the whole point of put() is that it passes the data to the task to be written to the external system – not only is the message in a different system, it's likely been converted to some native format for that system and is no longer in Connect's data API format.

          Is there a reason you want the original value in the task? Can you describe your use case?

          Show
          ewencp Ewen Cheslack-Postava added a comment - Dhananjay Patkar The record is maintained internally, but it is not exposed to the connector. It's somewhat counter to the point of SMTs since in some cases you may be trying to do things like remove PII, in which case you definitely don't want to give the connector task a chance to see that data. Regarding support for a transformation after SinkTask.put(), that doesn't really make sense since the whole point of put() is that it passes the data to the task to be written to the external system – not only is the message in a different system, it's likely been converted to some native format for that system and is no longer in Connect's data API format. Is there a reason you want the original value in the task? Can you describe your use case?
          Hide
          dhananjaydp Dhananjay Patkar added a comment - - edited

          Thanks Ewen Cheslack-Postava for quick reply.

          I have a use case, wherein I need to map incoming raw message to standard format and persist, as well as persist raw message as is.
          Converting from raw message to standard format message is not always guaranteed, but I still need to persist raw message.

          I understand I can write multiple consumer to same topic, 1 consumer will persist raw messages as is and other will transform raw message into standard format and persist.

          Just wanted to know, is there way I can do it as a single consumer through SMT.

          The record is maintained internally, but it is not exposed to the connector
          Is there a possibility to expose raw message based on configuration of SMT?

          Show
          dhananjaydp Dhananjay Patkar added a comment - - edited Thanks Ewen Cheslack-Postava for quick reply. I have a use case, wherein I need to map incoming raw message to standard format and persist, as well as persist raw message as is. Converting from raw message to standard format message is not always guaranteed, but I still need to persist raw message. I understand I can write multiple consumer to same topic, 1 consumer will persist raw messages as is and other will transform raw message into standard format and persist. Just wanted to know, is there way I can do it as a single consumer through SMT. The record is maintained internally, but it is not exposed to the connector Is there a possibility to expose raw message based on configuration of SMT?
          Hide
          ewencp Ewen Cheslack-Postava added a comment -

          Dhananjay Patkar You cannot do that with SMTs, you would need to define two connectors as you describe. The KIP probably could have included some more information about why. Your use case is really a special case of the more general problem of a single record generating multiple outputs, with some of those outputs possibly transformed in some way. Supporting this would significantly complicate the system, in ways that it would be difficult to provide the guarantees we provide today (e.g. at-least once delivery). It messes with some core concepts in Connect, such as having unique (partition, offset) values for each message. Complex processing topologies like this are better suited to Kafka Streams, which was built just for that purpose.

          Note that you can do this without SMTs if you write your own connector that can safely handle each message resulting in multiple writes to some output system. You could still use the transformations internally in that connector if you wanted to reuse the existing transformation code, they just wouldn't be configured via the standard mechanism in Connect.

          As for the specific use case of the original message, if someone made a strong case for it and a KIP describing how it would be implemented, sure. We're open to plenty of ideas about how to improve the framework.

          Show
          ewencp Ewen Cheslack-Postava added a comment - Dhananjay Patkar You cannot do that with SMTs, you would need to define two connectors as you describe. The KIP probably could have included some more information about why. Your use case is really a special case of the more general problem of a single record generating multiple outputs, with some of those outputs possibly transformed in some way. Supporting this would significantly complicate the system, in ways that it would be difficult to provide the guarantees we provide today (e.g. at-least once delivery). It messes with some core concepts in Connect, such as having unique (partition, offset) values for each message. Complex processing topologies like this are better suited to Kafka Streams, which was built just for that purpose. Note that you can do this without SMTs if you write your own connector that can safely handle each message resulting in multiple writes to some output system. You could still use the transformations internally in that connector if you wanted to reuse the existing transformation code, they just wouldn't be configured via the standard mechanism in Connect. As for the specific use case of the original message, if someone made a strong case for it and a KIP describing how it would be implemented, sure. We're open to plenty of ideas about how to improve the framework.

            People

            • Assignee:
              ewencp Ewen Cheslack-Postava
              Reporter:
              ewencp Ewen Cheslack-Postava
            • Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development