Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9436

New Kafka Connect SMT for plainText => Struct(or Map)

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • KafkaConnect

    Description

      I'd like to parse and convert plain text rows to struct(or map) data, and load into documented database such as mongoDB, elasticSearch, etc... with SMT

       

      For example

       

      1. String parse ( with timemillis )

      {
         "code" : "dev_kafka_pc001_1580372261372"
         ,"recode1" : "a"
         ,"recode2" : "b" 
      }
      "transforms": "RegexTransform",
      "transforms.RegexTransform.type": "org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value",
      
      "transforms.RegexTransform.struct.field": "message",
      "transforms.RegexTransform.regex": "^(.{3,4})_(.*)_(pc|mw|ios|and)([0-9]{3})_([0-9]{13})" "transforms.RegexTransform.mapping": "env,serviceId,device,sequence,datetime:TIMEMILLIS"

       

       

      2. plain text apache log

      "111.61.73.113 - - [08/Aug/2019:18:15:29 +0900] \"OPTIONS /api/v1/service_config HTTP/1.1\" 200 - 101989 \"http://local.test.com/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36\""
      

      SMT connect config with regular expression below can easily transform a plain text to struct (or map) data.

       

      "transforms": "RegexTransform",
      "transforms.RegexTransform.type": "org.apache.kafka.connect.transforms.ToStructByRegexTransform$Value",
      
      "transforms.RegexTransform.struct.field": "message",
      "transforms.RegexTransform.regex": "^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(GET|POST|OPTIONS|HEAD|PUT|DELETE|PATCH) (.+?) (.+?)\" (\\d{3}) ([0-9|-]+) ([0-9|-]+) \"([^\"]+)\" \"([^\"]+)\""
      
      "transforms.RegexTransform.mapping": "IP,RemoteUser,AuthedRemoteUser,DateTime,Method,Request,Protocol,Response,BytesSent,Ms:NUMBER,Referrer,UserAgent"
      

       

      I have PR about this

      Attachments

        Activity

          People

            Unassigned Unassigned
            whsoul82 whsoul
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: