Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-3079

HDFS sink using snappy compression cannot process .tmp file correctly when data is writing in

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.5.2
    • None
    • Sinks+Sources
    • None

    Description

      I'm using HDFS sink with Snappy compression codec. When JSON events is writing into HDFS, there is a .snappy.tmp file generated. If I want to access data in that tmp file with hive, there would be a JSON parsing error.

      I think the reason is HDFS sink already put some Snappy format content into the tmp file, but as the file is not finished, writing Snappy format is not completed yet, which cannot be recognised by Hive JSON Serde. After the file is rolled up to a normal Snappy file, it can be processed corrected.

      So is there a way to keep text format while writing data into the tmp file, and convert it to Snappy format after the tmp file is rolled up?

      Attachments

        Activity

          People

            Unassigned Unassigned
            channingzong Chang Zong
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: