Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.11.1, 0.12.0
Description
Nested numpy arrays (as the scalar value) cannot be converted to a list-of-list type array:
arr = np.empty(2, dtype=object) arr[:] = [np.array([1, 2]), np.array([2, 3])] pa.array([arr, arr])
results in
ArrowTypeError: only size-1 arrays can be converted to Python scalars
Starting from lists of lists works fine:
lists = [[1, 2], [2, 3]] pa.array([lists, lists]).type
ListType(list<item: list<item: int64>>)
Specifying the type explicitly as pa.array([arr, arr], type=pa.list_(pa.list_(pa.int64()))) does not help.
Due to this, a round-trip is not working, as the list of list type gives back an array of arrays in python:
In [2]: lists = [[1, 2], [2, 3]] ...: a = pa.array([lists, lists]) In [3]: a.to_pandas() Out[3]: array([array([array([1, 2]), array([2, 3])], dtype=object), array([array([1, 2]), array([2, 3])], dtype=object)], dtype=object) In [4]: pa.array(a.to_pandas()) --------------------------------------------------------------------------- ArrowTypeError Traceback (most recent call last) <ipython-input-4-9fee6dc9d0b8> in <module> ----> 1 pa.array(a.to_pandas()) ~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib.array() ~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib._ndarray_to_array() ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status() ArrowTypeError: only size-1 arrays can be converted to Python scalars
Origingal report:
In [19]: df = pd.DataFrame({'a': [[[1], [2]], [[2], [3]]], 'b': [1, 2]}) In [20]: df.iloc[0].to_dict() Out[20]: {'a': [[1], [2]], 'b': 1} In [21]: pa.Table.from_pandas(df).to_pandas().iloc[0].to_dict() Out[21]: {'a': array([array([1]), array([2])], dtype=object), 'b': 1} In [24]: np.array(df.iloc[0].to_dict()['a']).shape Out[24]: (2, 1) In [25]: pa.Table.from_pandas(df).to_pandas().iloc[0].to_dict()['a'].shape Out[25]: (2,)
Adding extra array type is not functioning as expected.
More importantly, this would fail
In [108]: df = pd.DataFrame({'a': [[[1, 2],[2, 3]], [[1,2], [2, 3]]], 'b': [[1, 2],[2, 3]]}) In [109]: df Out[109]: a b 0 [[1, 2], [2, 3]] [1, 2] 1 [[1, 2], [2, 3]] [2, 3] In [110]: pa.Table.from_pandas(pa.Table.from_pandas(df).to_pandas()) --------------------------------------------------------------------------- ArrowTypeError Traceback (most recent call last) <ipython-input-110-4a09836f807e> in <module>() ----> 1 pa.Table.from_pandas(pa.Table.from_pandas(df).to_pandas()) /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/table.pxi in pyarrow.lib.Table.from_pandas() 1215 <pyarrow.lib.Table object at 0x7f05d1fb1b40> 1216 """ -> 1217 names, arrays, metadata = pdcompat.dataframe_to_arrays( 1218 df, 1219 schema=schema, /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/pandas_compat.pyc in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe) 379 arrays = [convert_column(c, t) 380 for c, t in zip(columns_to_convert, --> 381 convert_types)] 382 else: 383 from concurrent import futures /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/pandas_compat.pyc in convert_column(col, ty) 374 e.args += ("Conversion failed for column {0!s} with type {1!s}" 375 .format(col.name, col.dtype),) --> 376 raise e 377 378 if nthreads == 1: ArrowTypeError: ('only size-1 arrays can be converted to Python scalars', 'Conversion failed for column a with type object')
Attachments
Issue Links
- relates to
-
ARROW-5645 [Python] Support inferring nested ndarray with ndim > 1
- Open
- links to