Description
Parquet supports some unsigned datatypes. Here is the definition related in parquet.thrift
/** * Common types used by frameworks(e.g. hive, pig) using parquet. This helps map * between types in those frameworks to the base types in parquet. This is only * metadata and not needed to read or write the data. */ /** * An unsigned integer value. * * The number describes the maximum number of meaningful data bits in * the stored value. 8, 16 and 32 bit values are stored using the * INT32 physical type. 64 bit values are stored using the INT64 * physical type. * */ UINT_8 = 11; UINT_16 = 12; UINT_32 = 13; UINT_64 = 14;
Spark does not support unsigned datatypes. In SPARK-10113, we emit an exception with a clear message for them.
UInt8-[0:255]
UInt16-[0:65535]
UInt32-[0:4294967295]
UInt64-[0:18446744073709551615]
Unsigned types - may be used to produce smaller in-memory representations of the data. If the stored value is larger than the maximum allowed by int32 or int64, then the behavior is undefined.
In this ticket, we try to read them as a higher precision signed type
Attachments
Issue Links
- is related to
-
SPARK-10113 Support for unsigned Parquet logical types
- Resolved
1.
|
read parquet uint8/16/32 logical types | Resolved | Kent Yao 2 | |
2.
|
read parquet uint64 as decimal | Resolved | Kent Yao 2 |