Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9436

New Kafka Connect SMT for plainText => Struct(or Map)

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: KafkaConnect
    • Labels:
      None

      Description

      I'd like to parse and convert plain text rows to struct(or map) data, and load into documented database such as mongoDB, elasticSearch, etc... with SMT

       

      For example

       

      1. String parse ( with timemillis )

      {
         "code" : "dev_kafka_pc001_1580372261372"
         ,"recode1" : "a"
         ,"recode2" : "b" 
      }
      "transforms": "RegexTransform",
      "transforms.RegexTransform.type": "org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value",
      
      "transforms.RegexTransform.struct.field": "message",
      "transforms.RegexTransform.regex": "^(.{3,4})_(.*)_(pc|mw|ios|and)([0-9]{3})_([0-9]{13})" "transforms.RegexTransform.mapping": "env,serviceId,device,sequence,datetime:TIMEMILLIS"

       

       

      2. plain text apache log

      "111.61.73.113 - - [08/Aug/2019:18:15:29 +0900] \"OPTIONS /api/v1/service_config HTTP/1.1\" 200 - 101989 \"http://local.test.com/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36\""
      

      SMT connect config with regular expression below can easily transform a plain text to struct (or map) data.

       

      "transforms": "RegexTransform",
      "transforms.RegexTransform.type": "org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value",
      
      "transforms.RegexTransform.struct.field": "message",
      "transforms.RegexTransform.regex": "^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(GET|POST|OPTIONS|HEAD|PUT|DELETE|PATCH) (.+?) (.+?)\" (\\d{3}) ([0-9|-]+) ([0-9|-]+) \"([^\"]+)\" \"([^\"]+)\""
      
      "transforms.RegexTransform.mapping": "IP,RemoteUser,AuthedRemoteUser,DateTime,Method,Request,Protocol,Response,BytesSent,Ms:NUMBER,Referrer,UserAgent"
      

       

      I have PR about this

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              whsoul82 whsoul
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: