Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-4350

[Python] dtype=object arrays cannot be converted to a list-of-list ListArray

    XMLWordPrintableJSON

Details

    Description

      Nested numpy arrays (as the scalar value) cannot be converted to a list-of-list type array:

      arr = np.empty(2, dtype=object)
      arr[:] = [np.array([1, 2]), np.array([2, 3])]
      
      pa.array([arr, arr])
      

      results in

      ArrowTypeError: only size-1 arrays can be converted to Python scalars
      

      Starting from lists of lists works fine:

      lists = [[1, 2], [2, 3]]
      pa.array([lists, lists]).type
      
      ListType(list<item: list<item: int64>>)
      

      Specifying the type explicitly as pa.array([arr, arr], type=pa.list_(pa.list_(pa.int64()))) does not help.

      Due to this, a round-trip is not working, as the list of list type gives back an array of arrays in python:

      In [2]: lists = [[1, 2], [2, 3]] 
         ...: a = pa.array([lists, lists])                                                                                                                                                                                
      
      In [3]: a.to_pandas()                                                                                                                                                                                               
      Out[3]: 
      array([array([array([1, 2]), array([2, 3])], dtype=object),
             array([array([1, 2]), array([2, 3])], dtype=object)], dtype=object)
      
      In [4]: pa.array(a.to_pandas())                                                                                                                                                                                     
      ---------------------------------------------------------------------------
      ArrowTypeError                            Traceback (most recent call last)
      <ipython-input-4-9fee6dc9d0b8> in <module>
      ----> 1 pa.array(a.to_pandas())
      
      ~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()
      
      ~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib._ndarray_to_array()
      
      ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
      
      ArrowTypeError: only size-1 arrays can be converted to Python scalars
      

      Origingal report:

      In [19]: df = pd.DataFrame({'a': [[[1], [2]], [[2], [3]]], 'b': [1, 2]})
      
      In [20]: df.iloc[0].to_dict()
      Out[20]: {'a': [[1], [2]], 'b': 1}
      
      In [21]: pa.Table.from_pandas(df).to_pandas().iloc[0].to_dict()
      Out[21]: {'a': array([array([1]), array([2])], dtype=object), 'b': 1}
      
      In [24]: np.array(df.iloc[0].to_dict()['a']).shape
      Out[24]: (2, 1)
      
      In [25]: pa.Table.from_pandas(df).to_pandas().iloc[0].to_dict()['a'].shape
      Out[25]: (2,)
      

      Adding extra array type is not functioning as expected. 

       

      More importantly, this would fail

       

      In [108]: df = pd.DataFrame({'a': [[[1, 2],[2, 3]], [[1,2], [2, 3]]], 'b': [[1, 2],[2, 3]]})
      
      In [109]: df
      Out[109]:
      a b
      0 [[1, 2], [2, 3]] [1, 2]
      1 [[1, 2], [2, 3]] [2, 3]
      
      In [110]: pa.Table.from_pandas(pa.Table.from_pandas(df).to_pandas())
      ---------------------------------------------------------------------------
      ArrowTypeError Traceback (most recent call last)
      <ipython-input-110-4a09836f807e> in <module>()
      ----> 1 pa.Table.from_pandas(pa.Table.from_pandas(df).to_pandas())
      
      /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/table.pxi in pyarrow.lib.Table.from_pandas()
      1215 <pyarrow.lib.Table object at 0x7f05d1fb1b40>
      1216 """
      -> 1217 names, arrays, metadata = pdcompat.dataframe_to_arrays(
      1218 df,
      1219 schema=schema,
      
      /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/pandas_compat.pyc in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe)
      379 arrays = [convert_column(c, t)
      380 for c, t in zip(columns_to_convert,
      --> 381 convert_types)]
      382 else:
      383 from concurrent import futures
      
      /Users/pengyu/.pyenv/virtualenvs/starscream/2.7.11/lib/python2.7/site-packages/pyarrow/pandas_compat.pyc in convert_column(col, ty)
      374 e.args += ("Conversion failed for column {0!s} with type {1!s}"
      375 .format(col.name, col.dtype),)
      --> 376 raise e
      377
      378 if nthreads == 1:
      
      ArrowTypeError: ('only size-1 arrays can be converted to Python scalars', 'Conversion failed for column a with type object')
      
      

       

      Attachments

        Issue Links

          Activity

            People

              wesm Wes McKinney
              yupbank yu peng
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m