Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
2.4.5, 3.0.0
-
AIX 7.2
LinuxPPC64 with RedHat.
Description
tTrying to upgrade to Apache Spark 2.4.5 in our IBM systems (AIX and PowerPC) so as to be able to read data stored in parquet format, we notice that values associated with DOUBLE and DECIMAL types are parsed in the wrong form.
According toe parquet documentation, they always opt to store the values using little-endian representation for values:
https://github.com/apache/parquet-format/blob/master/Encodings.md
The plain encoding is used whenever a more efficient encoding can not be used. It stores the data in the following format: BOOLEAN: Bit Packed, LSB first INT32: 4 bytes little endian INT64: 8 bytes little endian INT96: 12 bytes little endian (deprecated) FLOAT: 4 bytes IEEE little endian DOUBLE: 8 bytes IEEE little endian BYTE_ARRAY: length in 4 bytes little endian followed by the bytes contained in the array FIXED_LEN_BYTE_ARRAY: the bytes contained in the array For native types, this outputs the data as little endian. Floating point types are encoded in IEEE. For the byte array type, it encodes the length as a 4 byte little endian, followed by the bytes.
Attachments
Attachments
Issue Links
- is caused by
-
SPARK-26985 Test "access only some column of the all of columns " fails on big endian
- Resolved
- links to