Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
0.17.1
-
None
-
None
-
Python 3.8.2
PyArrow 0.17.1
Pandas 1.0.3
Linux (Manjaro)
Description
I'm seeing a very weird behavior when I try to store and retrieve a Pandas data-frame using the Feather format. Simplified example:
>>> import pandas as pd >>> df = pd.DataFrame(data={"scalar": [1, 2], "array": [[1], [7]]}) >>> df scalar array 0 1 [1] 1 2 [7] >>> df.to_feather("test.ft") >>> pd.read_feather("test.ft") scalar array 0 1 [16] 1 2 [1045468844972122628]
As you can see, the retrieved data is incorrect. I was originally trying to use the `feather-format` (not using Pandas directly) and that didn't work well either.
By playing around with the data-frame that is to be stored I can also get different but still incorrect behavior, e.g. a larger list, an error that says the file size is incorrect, or simply a segmentation fault.
This is my first time using Feather/Arrow BTW.
Attachments
Issue Links
- is duplicated by
-
ARROW-8860 [C++] IPC/Feather decompression broken for nested arrays
- Resolved