Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.0.1
-
CentOS7
conda environment with pyarrow 1.0.1, numpy 1.19.1 and pandas 1.1.1
Description
pyarrow writes an invalid Feather v2 file, which it can't read afterwards.
OSError: Verification of flatbuffer-encoded Footer failed.
The following code reproduces the problem for me:
import pyarrow as pa import numpy as np import pandas as pd nbr_regions = 1223024 nbr_motifs = 4891 # Create (big) dataframe. df = pd.DataFrame( np.arange(nbr_regions * nbr_motifs, dtype=np.float32).reshape((nbr_regions, nbr_motifs)), index=pd.Index(['region' + str(i) for i in range(nbr_regions)], name='regions'), columns=pd.Index(['motif' + str(i) for i in range(nbr_motifs)], name='motifs') ) # Transpose dataframe df_transposed = df.transpose() # Write transposed dataframe to Feather v2 format. pf.write_feather(df_transposed, 'df_transposed.feather') # Trying to read the transposed dataframe from Feather v2 format, results in this error: df_transposed_read = pf.read_feather('df_transposed.feather')
--------------------------------------------------------------------------- OSError Traceback (most recent call last) <ipython-input-64-b41ad5157e77> in <module> ----> 1 df_transposed_read = pf.read_feather('df_transposed.feather') /software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/feather.py in read_feather(source, columns, use_threads, memory_map) 213 """ 214 _check_pandas_version() --> 215 return (read_table(source, columns=columns, memory_map=memory_map) 216 .to_pandas(use_threads=use_threads)) 217 /software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/feather.py in read_table(source, columns, memory_map) 235 """ 236 reader = ext.FeatherReader() --> 237 reader.open(source, use_memory_map=memory_map) 238 239 if columns is None: /software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/feather.pxi in pyarrow.lib.FeatherReader.open() /software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status() /software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status() OSError: Verification of flatbuffer-encoded Footer failed.
Later I discovered that it happens also if the original dataframe is created in the transposed order:
# Create (big) dataframe. df_without_transpose = pd.DataFrame( np.arange(nbr_motifs * nbr_regions, dtype=np.float32).reshape((nbr_motifs, nbr_regions)), index=pd.Index(['motif' + str(i) for i in range(nbr_motifs)], name='motifs'), columns=pd.Index(['region' + str(i) for i in range(nbr_regions)], name='regions'), ) pf.write_feather(df_without_transpose, 'df_without_transpose.feather') df_without_transpose_read = pf.read_feather('df_without_transpose.feather') --------------------------------------------------------------------------- OSError Traceback (most recent call last) <ipython-input-91-3cdad1d58c35> in <module> ----> 1 df_without_transpose_read = pf.read_feather('df_without_transpose.feather') /software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/feather.py in read_feather(source, columns, use_threads, memory_map) 213 """ 214 _check_pandas_version() --> 215 return (read_table(source, columns=columns, memory_map=memory_map) 216 .to_pandas(use_threads=use_threads)) 217 /software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/feather.py in read_table(source, columns, memory_map) 235 """ 236 reader = ext.FeatherReader() --> 237 reader.open(source, use_memory_map=memory_map) 238 239 if columns is None: /software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/feather.pxi in pyarrow.lib.FeatherReader.open() /software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status() /software/miniconda3/envs/pyarrow/lib/python3.8/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status() OSError: Verification of flatbuffer-encoded Footer failed.
Writing to Feather v1 format works:
pf.write_feather(df_transposed, 'df_transposed.v1.feather', version=1) df_transposed_read_v1 = pf.read_feather('df_transposed.v1.feather') # Now do the same, but also save the index in the Feather v1 file. df_transposed_reset_index = df_transposed.reset_index() pf.write_feather(df_transposed_reset_index, 'df_transposed_reset_index.v1.feather', version=1) df_transposed_reset_index_read_v1 = pf.read_feather('df_transposed_reset_index.v1.feather') # Returns True df_transposed_reset_index_read_v1.equals(df_transposed)
Attachments
Issue Links
- relates to
-
ARROW-11559 [C++] Improve flatbuffers verification limits
- Resolved
- links to