Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
ARROW-2073 implemented creating a StructArray from an array of tuples (in addition to from dicts).
This works in pyarrow.array (specifying the proper type):
In [2]: df = pd.DataFrame({'tuples': [(1, 2), (3, 4)]}) In [3]: struct_type = pa.struct([('a', pa.int64()), ('b', pa.int64())]) In [4]: pa.array(df['tuples'], type=struct_type) Out[4]: <pyarrow.lib.StructArray object at 0x7f1b02ff6818> -- is_valid: all not null -- child 0 type: int64 [ 1, 3 ] -- child 1 type: int64 [ 2, 4 ]
But does not yet work when converting a DataFrame to Table while specifying the type in a schema:
In [5]: pa.Table.from_pandas(df, schema=pa.schema([('tuples', struct_type)])) --------------------------------------------------------------------------- KeyError Traceback (most recent call last) ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in get_logical_type(arrow_type) 68 try: ---> 69 return logical_type_map[arrow_type.id] 70 except KeyError: KeyError: 24 During handling of the above exception, another exception occurred: NotImplementedError Traceback (most recent call last) <ipython-input-5-c18748f9b954> in <module> ----> 1 pa.Table.from_pandas(df, schema=pa.schema([('tuples', struct_type)])) ~/scipy/repos/arrow/python/pyarrow/table.pxi in pyarrow.lib.Table.from_pandas() ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in dataframe_to_arrays(df, schema, preserve_index, nthreads, columns, safe) 483 metadata = construct_metadata(df, column_names, index_columns, 484 index_descriptors, preserve_index, --> 485 types) 486 return all_names, arrays, metadata 487 ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in construct_metadata(df, column_names, index_levels, index_descriptors, preserve_index, types) 207 metadata = get_column_metadata(df[col_name], name=sanitized_name, 208 arrow_type=arrow_type, --> 209 field_name=sanitized_name) 210 column_metadata.append(metadata) 211 ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in get_column_metadata(column, name, arrow_type, field_name) 149 dict 150 """ --> 151 logical_type = get_logical_type(arrow_type) 152 153 string_dtype, extra_metadata = get_extension_dtype_info(column) ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in get_logical_type(arrow_type) 77 elif isinstance(arrow_type, pa.lib.Decimal128Type): 78 return 'decimal' ---> 79 raise NotImplementedError(str(arrow_type)) 80 81 NotImplementedError: struct<a: int64, b: int64>
Attachments
Issue Links
- links to