Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
From report on the pandas issue tracker: https://github.com/pandas-dev/pandas/issues/28252
With the latest released versions of fastparquet (0.3.2) and pyarrow (0.14.1), writing a file with pandas using the fastparquet engine cannot be read with the pyarrow engine:
df = pd.DataFrame({'A': [1, 2, 3]}) df.to_parquet("test.parquet", engine="fastparquet", compression=None) pd.read_parquet("test.parquet", engine="pyarrow")
gives the following error when reading:
----> 1 pd.read_parquet("test.parquet", engine="pyarrow") ~/miniconda3/lib/python3.7/site-packages/pandas/io/parquet.py in read_parquet(path, engine, columns, **kwargs) 292 293 impl = get_engine(engine) --> 294 return impl.read(path, columns=columns, **kwargs) ~/miniconda3/lib/python3.7/site-packages/pandas/io/parquet.py in read(self, path, columns, **kwargs) 123 kwargs["use_pandas_metadata"] = True 124 result = self.api.parquet.read_table( --> 125 path, columns=columns, **kwargs 126 ).to_pandas() 127 if should_close: ~/miniconda3/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib._PandasConvertible.to_pandas() ~/miniconda3/lib/python3.7/site-packages/pyarrow/table.pxi in pyarrow.lib.Table._to_pandas() ~/miniconda3/lib/python3.7/site-packages/pyarrow/pandas_compat.py in table_to_blockmanager(options, table, categories, ignore_metadata) 642 column_indexes = pandas_metadata.get('column_indexes', []) 643 index_descriptors = pandas_metadata['index_columns'] --> 644 table = _add_any_metadata(table, pandas_metadata) 645 table, index = _reconstruct_index(table, index_descriptors, 646 all_columns) ~/miniconda3/lib/python3.7/site-packages/pyarrow/pandas_compat.py in _add_any_metadata(table, pandas_metadata) 965 raw_name = 'None' 966 --> 967 idx = schema.get_field_index(raw_name) 968 if idx != -1: 969 if col_meta['pandas_type'] == 'datetimetz': ~/miniconda3/lib/python3.7/site-packages/pyarrow/types.pxi in pyarrow.lib.Schema.get_field_index() ~/miniconda3/lib/python3.7/site-packages/pyarrow/lib.cpython-37m-x86_64-linux-gnu.so in string.from_py.__pyx_convert_string_from_py_std__in_string() TypeError: expected bytes, dict found
Attachments
Issue Links
- links to