Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.16.0
-
None
Description
While Table.from_pandas() seems to work as expected with extension types,
Schema.from_pandas() raises an ArrowTypeError:
df = pd.DataFrame({ "x": pd.Series([1, 2, None], dtype="Int8"), "y": pd.Series(["a", "b", None], dtype="category"), "z": pd.Series(["ab", "bc", None], dtype="string"), }) print(pa.Table.from_pandas(df).schema) print(pa.Schema.from_pandas(df))
Results in:
x: int8 y: dictionary<values=string, indices=int8, ordered=0> z: string metadata -------- {b'pandas': b'{"index_columns": [{"kind": "range", "name": null, "start": 0, "' b'stop": 3, "step": 1}], "column_indexes": [{"name": null, "field_' b'name": null, "pandas_type": "unicode", "numpy_type": "object", "' b'metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "x", "f' b'ield_name": "x", "pandas_type": "int8", "numpy_type": "Int8", "m' b'etadata": null}, {"name": "y", "field_name": "y", "pandas_type":' b' "categorical", "numpy_type": "int8", "metadata": {"num_categori' b'es": 2, "ordered": false}}, {"name": "z", "field_name": "z", "pa' b'ndas_type": "unicode", "numpy_type": "string", "metadata": null}' b'], "creator": {"library": "pyarrow", "version": "0.16.0"}, "pand' b'as_version": "1.0.3"}'} --------------------------------------------------------------------------- ArrowTypeError Traceback (most recent call last) ... ArrowTypeError: Did not pass numpy.dtype object
I'd imagine Table.from_pandas(df).schema and Schema.from_pandas(df) should result in the exact same object?
Attachments
Issue Links
- is duplicated by
-
ARROW-8159 [Python] pyarrow.Schema.from_pandas doesn't support ExtensionDtype
- Resolved