Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2818

Problems with Avro data and not Json and no data in HDFS

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Request
    • Status: Resolved
    • Critical
    • Resolution: Not A Problem
    • 1.5.2
    • 1.5.2
    • Sinks+Sources
    • None
    • HDP-2.3.0.0-2557 Sandbox

    Description

      Flume supplies twitter data in avro format and not in Json.
      Why?
      Flume Config Agent:
      TwitterAgent.sources = Twitter
      TwitterAgent.channels = MemChannel
      TwitterAgent.sinks = HDFS

      TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
      TwitterAgent.sources.Twitter.channels = MemChannel
      TwitterAgent.sources.Twitter.consumerKey = xxx
      TwitterAgent.sources.Twitter.consumerSecret = xxx
      TwitterAgent.sources.Twitter.accessToken = xxx
      TwitterAgent.sources.Twitter.accessTokenSecret = xxx
      TwitterAgent.sources.Twitter.maxBatchSize = 10
      TwitterAgent.sources.Twitter.maxBatchDurationMillis = 200
      TwitterAgent.sources.Twitter.keywords = United Nations
      TwitterAgent.sources.Twitter.deserializer.schemaType = LITERAL

      1. HDFS Sink
        TwitterAgent.sinks.HDFS.channel = MemChannel
        TwitterAgent.sinks.HDFS.type = hdfs
        TwitterAgent.sinks.HDFS.hdfs.path = /demo/tweets/stream/%y-%m-%d/%H%M%S
        TwitterAgent.sinks.HDFS.hdfs.filePrefix = events
        TwitterAgent.sinks.HDFS.hdfs.round = true
        TwitterAgent.sinks.HDFS.hdfs.roundValue = 5
        TwitterAgent.sinks.HDFS.hdfs.roundUnit = minute
        TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
        TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
        TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text

      TwitterAgent.channels.MemChannel.type = memory
      TwitterAgent.channels.MemChannel.capacity = 1000
      TwitterAgent.channels.MemChannel.transactionCapacity = 100

      Twitter Data from Flume:
      Obj avro.schema�
      {"type":"record","name":"Doc","doc":"adoc","fields":[

      {"name":"id","type":"string"}

      ,

      {"name":"user_friends_count","type":["int","null"]}

      ,

      {"name":"user_location","type":["string","null"]}

      ,

      {"name":"user_description","type":["string","null"]}

      ,

      {"name":"user_statuses_count","type":["int","null"]}

      ,

      {"name":"user_followers_count","type":["int","null"]}

      ,

      {"name":"user_name","type":["string","null"]}

      ,

      {"name":"user_screen_name","type":["string","null"]}

      ,

      {"name":"created_at","type":["string","null"]}

      ,

      {"name":"text","type":["string","null"]}

      ,

      {"name":"retweet_count","type":["long","null"]}

      ,

      {"name":"retweeted","type":["boolean","null"]}

      ,

      {"name":"in_reply_to_user_id","type":["long","null"]}

      ,

      {"name":"source","type":["string","null"]}

      ,

      {"name":"in_reply_to_status_id","type":["long","null"]}

      ,

      {"name":"media_url_https","type":["string","null"]}

      ,

      {"name":"expanded_url","type":["string","null"]}

      ]}�]3hˊى���|����$656461386520784896� �お絵描きするショタコン/オタクまっしぐら。論破メインに雑食もぐもぐ/成人済み pixiv:323565 隔離:【@yh_u_】�n� ユハズ
      yhzz_(2015-10-20T13:26:05Z� はじめた~リセマラめんどくさいし緑茶来たから普通にこのまま進める https://t.co/ZpfDqw4l9g � <a href=" http://twitter.com" rel="nofollow">Twitter Web Client</a> ^ https://pbs.twimg.com/media/CRw4Js3UAAAGusn.pngthttp://twitter.com/yhzz_/status/656461386520784896/photo/1$656461390677417984� <Mundo de las sombras (Cc,Extr)�#RP User de un agente del gobierno |20| Que no me veais ni noteis mi presencia no quiere decir que no os este observando desde las sombras�� � JKP® BakasumaUserSinCausa(2015-10-20T13:26:06Z� RT @NaiiVicious: @Lisi_Hattori @UserSinCausa https://t.co/M2LTJWwqae � <a href=" http://twitter.com/download/android" rel="nofollow">Twitter for Android</a> ^ https://pbs.twimg.com/media/CRthC1mWUAIFTF-.jpghttp://twitter.com/NaiiVicious/status/656224896297529344/photo/1�]3hˊى���|���

      By loading this twitter data into a HDFS table. It is not possible to convert with avro-tools-1.7.7.jar. into Json. We get error message: "No data"
      If we want to read this file we get following error message:
      "java -jar avro-tools-1.7.7.jar tojson twitter.avro > twitter.json
      Exception in thread "main" org.apache.avro.AvroRuntimeException: java.io.EOFException"

      I hope you could help us.

      Kind regards,
      Karl

      Details

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            Karl24 Kettler Karl
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment