-
Type:
Bug
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: 0.8.0
-
Component/s: Storage - Parquet
-
Labels:None
Using CTAS created a parquet file through drill having the varchar datatype.
Created parquet file looks like this through parquet-tools
VARCHAR_col: OPTIONAL BINARY O:UTF8 R:0 D:1
VAR16CHAR_col: OPTIONAL BINARY O:UTF8 R:0 D:1
VARCHAR_col: BINARY SNAPPY DO:0 FPO:894307 SZ:16344/231716/14.18 VC:378624 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
VAR16CHAR_col: BINARY SNAPPY DO:0 FPO:910651 SZ:25830/381493/14.77 VC:378624 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
On querying the file several records show up having corrupted data for these fields.
VAR16CHAR_col |
---------------
������������ |
�������� |
����� |
�� |
�� |
������������ |
�������� |
����� |
�� |
�� |
������������ |
�������� |
����� |
�� |
�� |
������������ |
�������� |
����� |
�� |
�� |
������������ |
�������� |
����� |
�� |
If dictionary encoding is turned off the resultant file can be read without these issues.