Details
-
Wish
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
python 3.6.7, pandas 0.24.2, pyarrow 0.14.1 on WSL in Windows 10
Description
Hi,
Using python 3.6.7, pandas 0.24.2, pyarrow 0.14.1 on WSL in Windows 10...
```python
import pandas as pd
df = pd.DataFrame({'a': [1,2,3], 'b': [set([1,2]), set([2,3]), set([3,4,5])]})
df.to_feather('test.ft')
```
I get:
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/gioras/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2131, in to_feather
to_feather(self, fname)
File "/home/gioras/.local/lib/python3.6/site-packages/pandas/io/feather_format.py", line 83, in to_feather
feather.write_feather(df, path)
File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/feather.py", line 182, in write_feather
writer.write(df)
File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/feather.py", line 93, in write
table = Table.from_pandas(df, preserve_index=False)
File "pyarrow/table.pxi", line 1174, in pyarrow.lib.Table.from_pandas
File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 496, in dataframe_to_arrays
for c, f in zip(columns_to_convert, convert_fields)]
File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 496, in <listcomp>
for c, f in zip(columns_to_convert, convert_fields)]
File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 487, in convert_column
raise e
File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", line 481, in convert_column
result = pa.array(col, type=type_, from_pandas=True, safe=safe)
File "pyarrow/array.pxi", line 191, in pyarrow.lib.array
File "pyarrow/array.pxi", line 78, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: ('Could not convert {1, 2} with type set: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column b with type object')
```
And obviously `df.drop('b', axis=1).to_feather('test.ft')` works.
Questions:
(1) Is it possible to support these kind of set/list columns?
(2) Anyone has an idea on how to deal with this? I cannot unnest these set/list columns as this would explode the DataFrame. My only other idea is to convert set `{1,2}` into a string `1,2` and parse it after reading the file. And hoping it won't be slow.
Update:
With lists column the error is different:
```python
import pandas as pd
df = pd.DataFrame({'a': [1,2,3], 'b': [[1,2], [2,3], [3,4,5]]})
df.to_feather('test.ft')
```
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/gioras/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 2131, in to_feather
to_feather(self, fname)
File "/home/gioras/.local/lib/python3.6/site-packages/pandas/io/feather_format.py", line 83, in to_feather
feather.write_feather(df, path)
File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/feather.py", line 182, in write_feather
writer.write(df)
File "/home/gioras/.local/lib/python3.6/site-packages/pyarrow/feather.py", line 97, in write
self.writer.write_array(name, col.data.chunk(0))
File "pyarrow/feather.pxi", line 67, in pyarrow.lib.FeatherWriter.write_array
File "pyarrow/error.pxi", line 93, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: list<item: int64>
```