Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
Description
I tried reading description for Parquet file with nested maps using parquet-reader tool.
This file has the following structure:
required group field_id=0 spark_schema { optional group field_id=1 a (Map) { repeated group field_id=2 key_value { required binary field_id=3 key (String); optional group field_id=4 value (Map) { repeated group field_id=5 key_value { required int32 field_id=6 key; required boolean field_id=7 value; } } } } required int32 field_id=8 b; required double field_id=9 c; }
When I print it using DebugPrint, I see:
$ ./parquet-reader nested_maps.snappy.parquet --only-metadata <some text is omitted for the sake of readability> Column 0: a.key_value.key (BYTE_ARRAY/UTF8) Column 1: a.key_value.value.key_value.key (INT32) Column 2: a.key_value.value.key_value.value (BOOLEAN) Column 3: b (INT32) Column 4: c (DOUBLE) </some text is omitted for the sake of readability>
When I pring it using JSONPrint, I see:
$ ./parquet-reader nested_maps.snappy.parquet --json <some text is omitted for the sake of readability> "Columns": [ { "Id": "0", "Name": "key", "PhysicalType": "BYTE_ARRAY", "ConvertedType": "UTF8", "LogicalType": {"Type": "String"} }, { "Id": "1", "Name": "key", "PhysicalType": "INT32", "ConvertedType": "NONE", "LogicalType": {"Type": "None"} }, { "Id": "2", "Name": "value", "PhysicalType": "BOOLEAN", "ConvertedType": "NONE", "LogicalType": {"Type": "None"} }, { "Id": "3", "Name": "b", "PhysicalType": "INT32", "ConvertedType": "NONE", "LogicalType": {"Type": "None"} }, { "Id": "4", "Name": "c", "PhysicalType": "DOUBLE", "ConvertedType": "NONE", "LogicalType": {"Type": "None"} } ] </some text is omitted for the sake of readability>
Column 0 and Column 1 has the same Name in JSON output. That's very confusing. It would be more correct to output the full path of the column (key -> a.key_value.key).
This issue can be corrected by changing a single line: https://github.com/apache/arrow/blob/master/cpp/src/parquet/printer.cc#L218
The proposed patch in the attachment
Attachments
Attachments
Issue Links
- links to