Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Using CTAS created a parquet file through drill having the varchar datatype.
Created parquet file looks like this through parquet-tools
VARCHAR_col: OPTIONAL BINARY O:UTF8 R:0 D:1
VAR16CHAR_col: OPTIONAL BINARY O:UTF8 R:0 D:1
VARCHAR_col: BINARY SNAPPY DO:0 FPO:894307 SZ:16344/231716/14.18 VC:378624 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
VAR16CHAR_col: BINARY SNAPPY DO:0 FPO:910651 SZ:25830/381493/14.77 VC:378624 ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED
On querying the file several records show up having corrupted data for these fields.
VAR16CHAR_col |
---------------
������������ |
�������� |
����� |
�� |
�� |
������������ |
�������� |
����� |
�� |
�� |
������������ |
�������� |
����� |
�� |
�� |
������������ |
�������� |
����� |
�� |
�� |
������������ |
�������� |
����� |
�� |
If dictionary encoding is turned off the resultant file can be read without these issues.