Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-12444

Add TranslateSurrogates SMT to Kafka Connect

    XMLWordPrintableJSON

Details

    • Hide
      # TranslateSurrogates

      The following provides usage information for the Kafka SMT
      `io.confluent.kafka.connect.transforms.custom.translatesurrogates.TranslateSurrogates
      `
      ## Description
      Translate UTF16 [surrogate pairs](https://en.wikipedia.org/wiki/UTF-16#Code_points_from_U+010000_to_U+10FFFF) in the message to their corresponding UTF8 [URL encoding](https://en.wikipedia.org/wiki/Percent-encoding), or to the default replacement character `U+FFFD`, or a configured replacement string, or with Java [notated](https://docs.oracle.com/javase/tutorial/i18n/text/string.html) UTF16.

      For string fields only, optionally specified at any arbitrary depth, in structs, arrays, or maps.

      Use the concrete transformation type designed for the record key `Key.class.getName()` or value `Value.class.getName()`.
      ## Example

      This configuration snippet shows how to use `TranslateSurrgoates` to translate String messages.

      ```
          "transforms": "translatesurrogates",
          "transforms.translatesurrogates.type": "io.confluent.kafka.connect.transforms.custom.translatesurrogates.TranslateSurrogates$Value",
          "transforms.translatesurrogates.mode": "url-encode",
          "transforms.translatesurrogates.fields" : [
              "field1",
              "field2"
          ]
      ```

      This translates all surrogates found in `field1`, and `field2` in the value of the message, and replaces them with their corresponding [URL encoded](https://en.wikipedia.org/wiki/Percent-encoding) equivalents.
      ## Properties
      | Name | Description | Type | Default | Importance |
      | :--- | :---- | :--- | :--- | :--- |
      | mode | Translation mode for the transform. This must be one of `url-encode` (default), `java-encode`, or `replace` | string | `url-encode` | high |
      | fields | Names of fields to translate surrogates in. If not specified, entire message is parsed for strings to translate | list | none | medium |
      | replace | Custom replacement string, that will be applied to surrogates found in all `fields` values (non-empty string values only). Works only with `replace` mode. | string | `U+FFFD` | low |


      ## Predicates

      Transformations can be configured with predicates so that the transformation is applied only to records which satisfy a condition. You can use predicates in a transformation chain and, when combined with the [Filter (Apache Kafka)](https://docs.confluent.io/platform/current/connect/transforms/filter-ak.html#ak-filter), predicates can conditionally filter out specific records. For details and examples, see [Predicates](https://docs.confluent.io/platform/current/connect/transforms/filter-ak.html#predicates).
      Show
      # TranslateSurrogates The following provides usage information for the Kafka SMT `io.confluent.kafka.connect.transforms.custom.translatesurrogates.TranslateSurrogates ` ## Description Translate UTF16 [surrogate pairs]( https://en.wikipedia.org/wiki/UTF-16#Code_points_from_U+010000_to_U+10FFFF ) in the message to their corresponding UTF8 [URL encoding]( https://en.wikipedia.org/wiki/Percent-encoding), or to the default replacement character `U+FFFD`, or a configured replacement string, or with Java [notated]( https://docs.oracle.com/javase/tutorial/i18n/text/string.html ) UTF16. For string fields only, optionally specified at any arbitrary depth, in structs, arrays, or maps. Use the concrete transformation type designed for the record key `Key.class.getName()` or value `Value.class.getName()`. ## Example This configuration snippet shows how to use `TranslateSurrgoates` to translate String messages. ```     "transforms": "translatesurrogates",     "transforms.translatesurrogates.type": "io.confluent.kafka.connect.transforms.custom.translatesurrogates.TranslateSurrogates$Value",     "transforms.translatesurrogates.mode": "url-encode",     "transforms.translatesurrogates.fields" : [         "field1",         "field2"     ] ``` This translates all surrogates found in `field1`, and `field2` in the value of the message, and replaces them with their corresponding [URL encoded]( https://en.wikipedia.org/wiki/Percent-encoding ) equivalents. ## Properties | Name | Description | Type | Default | Importance | | :--- | :---- | :--- | :--- | :--- | | mode | Translation mode for the transform. This must be one of `url-encode` (default), `java-encode`, or `replace` | string | `url-encode` | high | | fields | Names of fields to translate surrogates in. If not specified, entire message is parsed for strings to translate | list | none | medium | | replace | Custom replacement string, that will be applied to surrogates found in all `fields` values (non-empty string values only). Works only with `replace` mode. | string | `U+FFFD` | low | ## Predicates Transformations can be configured with predicates so that the transformation is applied only to records which satisfy a condition. You can use predicates in a transformation chain and, when combined with the [Filter (Apache Kafka)]( https://docs.confluent.io/platform/current/connect/transforms/filter-ak.html#ak-filter), predicates can conditionally filter out specific records. For details and examples, see [Predicates]( https://docs.confluent.io/platform/current/connect/transforms/filter-ak.html#predicates ).

    Description

      Change to add translation of Unicode surrogate pairs to Kafka Connect. PR submitted - https://github.com/apache/kafka/pull/10287.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sivakunapuli Siva Kunapuli
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 48h
                48h
                Remaining:
                Remaining Estimate - 48h
                48h
                Logged:
                Time Spent - Not Specified
                Not Specified