Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
-
uname -a Linux myhost 4.4.0-63-generic #84-Ubuntu SMP Wed Feb 1 17:20:32 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Description
I have some parquet files that are created by Java MR process (which I do not own). I am able to read these fields successfully in pig and Spark, but for some reason the String fields are being mangled when I view the files with parquet-tools (cat).
Here are the details on the file metadata using today's build of parquet-tools:
hadoop jar parquet-tools-1.9.1-SNAPSHOT.jar meta <hdfs>/parquet-r-00000
Output:
file: hdfs://<path>/parquet-r-00000 creator: parquet-mr version 1.8.1 (build 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf) file schema: MY_DATA -------------------------------------------------------------------------------- myfield: OPTIONAL BINARY R:0 D:1 row group 1: RC:37343 TS:32397576 OFFSET:4 -------------------------------------------------------------------------------- myfield: BINARY SNAPPY DO:0 FPO:4 SZ:273374/556406/2.04 VC:37343 ENC:RLE,BIT_PACKED,PLAIN_DICTIONARY ST:[no stats for this column]
Has anyone seen this before?