[SPARK-31703] Changes made by SPARK-26985 break reading parquet files correctly in BigEndian architectures (AIX + LinuxPPC64) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 2.4.5, 3.0.0
Fix Version/s: 2.4.7, 3.0.1, 3.1.0
Component/s: Spark Core
Labels:
- BigEndian
- correctness
Environment:

AIX 7.2
LinuxPPC64 with RedHat.

Target Version/s:

2.4.7, 3.0.1

Description

tTrying to upgrade to Apache Spark 2.4.5 in our IBM systems (AIX and PowerPC) so as to be able to read data stored in parquet format, we notice that values associated with DOUBLE and DECIMAL types are parsed in the wrong form.

According toe parquet documentation, they always opt to store the values using little-endian representation for values:
https://github.com/apache/parquet-format/blob/master/Encodings.md

The plain encoding is used whenever a more efficient encoding can not be used. It
stores the data in the following format:

BOOLEAN: Bit Packed, LSB first
INT32: 4 bytes little endian
INT64: 8 bytes little endian
INT96: 12 bytes little endian (deprecated)
FLOAT: 4 bytes IEEE little endian
DOUBLE: 8 bytes IEEE little endian
BYTE_ARRAY: length in 4 bytes little endian followed by the bytes contained in the array
FIXED_LEN_BYTE_ARRAY: the bytes contained in the array

For native types, this outputs the data as little endian. Floating
point types are encoded in IEEE.
For the byte array type, it encodes the length as a 4 byte little
endian, followed by the bytes.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Data_problem_Spark.gif
13/May/20 13:04
2.37 MB
Michail Giannakopoulos

Issue Links

is caused by

SPARK-26985 Test "access only some column of the all of columns " fails on big endian

Resolved

links to

[Github] Pull Request #29383 (tinhto-000)

[Github] Pull Request #29419 (tinhto-000)

Activity

People

Assignee:: Tin Hang To

Reporter:: Michail Giannakopoulos

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 13/May/20 13:04

Updated:: 22/Jun/21 06:52

Resolved:: 12/Aug/20 06:39