Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.5.2
-
None
-
None
Description
I'm using HDFS sink with Snappy compression codec. When JSON events is writing into HDFS, there is a .snappy.tmp file generated. If I want to access data in that tmp file with hive, there would be a JSON parsing error.
I think the reason is HDFS sink already put some Snappy format content into the tmp file, but as the file is not finished, writing Snappy format is not completed yet, which cannot be recognised by Hive JSON Serde. After the file is rolled up to a normal Snappy file, it can be processed corrected.
So is there a way to keep text format while writing data into the tmp file, and convert it to Snappy format after the tmp file is rolled up?