Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.14.1
-
None
-
None
Description
I want to serialize pytorch tensors, but as they are not implemented in arrow yet I convert them to a numpy array like this: t.numpy() (https://pytorch.org/docs/stable/tensors.html?highlight=numpy#torch.Tensor.numpy) which returns an {{ndarray{{. My tensors are 1-dimensional, the result is a 1-dimensional ndarray.
Calling df.to_feather("fname.feather") yields pyarrow.lib.ArrowNotImplementedError: list<item: float>.
Next I tried pyarrow.array(t.numpy()) which results in pyarrow.lib.ArrowInvalid: ('Could not convert [\n 0.00500498,\n -0.00732583,\n... with type pyarrow.lib.FloatArray: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column 0 with type object').
I would appreciate if this would work more out-of-the-box.
Upon request a full example:
import torch import pyarrow import pandas as pd pd.DataFrame([[torch.ones(2)]], columns=["0"]).to_feather("fname.feather") pd.DataFrame([[torch.ones(2).numpy()]], columns=["0"]).to_feather("fname.feather") pd.DataFrame([[pyarrow.array(torch.ones(2).numpy())]], columns=["0"]).to_feather("fname.feather")
ArrowInvalid: ('Could not convert tensor([1., 1.]) with type Tensor: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column 0 with type object') ArrowNotImplementedError: list<item: float> ArrowInvalid: ('Could not convert [\n 1,\n 1\n] with type pyarrow.lib.FloatArray: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column 0 with type object')