Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-205

TailSource reads lines using a method(readLine) which does character set interpretation and that breaks all my UTF-8 characters

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.9.1
    • 0.9.2
    • Node
    • None
    • Debian Lenny, with all files in UTF-8 Encoding

    Description

      Flume tails a file that is encoded in UTF-8, opening the file shows me ä,ö,ü and others characters. When I open the seq files in Hadoop, which were transmitted and stored by flume through the collectorSink in raw format, all special characters like ä,ö,ü are broken like ä — it seems somewhere might be a change between UTF-8 and another encoding or is the raw output format the problem?

      From Jon:
      "I think the bug in TailSource – it reads lines using a method (readLine) which does character set interpretation."

      Original discussion:
      https://groups.google.com/a/cloudera.org/group/flume-user/browse_thread/thread/20231a0f98569d8a#

      Attachments

        Issue Links

          Activity

            People

              jmhsieh Jonathan Hsieh
              flume_dboek Disabled imported user
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: