Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Won't Fix
-
0.12.1
-
None
-
* pyarrow 0.12.1
* numpy 1.16.1
* Python 3.7.0
* Intel Core i7-7820HQ
* (macOS 10.13.6)
Description
pa.serialize does not appear to properly encode the endianness of multi-byte data:
# roundtrip.py import numpy as np import pyarrow as pa arr = np.array([1], dtype=np.dtype('>i2')) buf = pa.serialize(arr).to_buffer() result = pa.deserialize(buf) print(f"Original: {arr.dtype.str}, deserialized: {result.dtype.str}") np.testing.assert_array_equal(arr, result)
$ pipenv run python roundtrip.py Original: >i2, deserialized: <i2 Traceback (most recent call last): File "roundtrip.py", line 10, in <module> np.testing.assert_array_equal(arr, result) File "/Users/gabejoseph/.local/share/virtualenvs/arrow-roundtrip-1xVSuBtp/lib/python3.7/site-packages/numpy/testing/_private/utils.py", line 896, in assert_array_equal verbose=verbose, header='Arrays are not equal') File "/Users/gabejoseph/.local/share/virtualenvs/arrow-roundtrip-1xVSuBtp/lib/python3.7/site-packages/numpy/testing/_private/utils.py", line 819, in assert_array_compare raise AssertionError(msg) AssertionError: Arrays are not equal Mismatch: 100% Max absolute difference: 255 Max relative difference: 0.99609375 x: array([1], dtype=int16) y: array([256], dtype=int16)
The data of the deserialized array is identical (big-endian), but the dtype Arrow assigns to it doesn't reflect its endianness (presumably uses the system endianness, which is little).