Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Not A Problem
-
0.8.0
-
None
Description
The pandas_type for bytes or unicode columns of an empty pandas DataFrame is unexpectedly float64
import numpy as np import pandas as pd import pyarrow as pa import json empty_df = pd.DataFrame({'unicode': np.array([], dtype=np.unicode_), 'bytes': np.array([], dtype=np.bytes_)}) empty_table = pa.Table.from_pandas(empty_df) json.loads(empty_table.schema.metadata[b'pandas'])['columns'] # Same behavior for input dtype np.unicode_ [{u'field_name': u'bytes', u'metadata': None, u'name': u'bytes', u'numpy_type': u'object', u'pandas_type': u'float64'}, {u'field_name': u'unicode', u'metadata': None, u'name': u'unicode', u'numpy_type': u'object', u'pandas_type': u'float64'}, {u'field_name': u'__index_level_0__', u'metadata': None, u'name': None, u'numpy_type': u'int64', u'pandas_type': u'int64'}]
Tested on Debian 8 with python2.7 and python 3.6.4