Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Done
-
0.12.1
-
None
-
Mac, Linux
Description
According to https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#partitioning-parquet-files, writing a parquet to S3 with `partition_cols` should work, but it fails for me. Example script:
import pandas as pd import sys print(sys.version) print(pd._version_) df = pd.DataFrame([{'a': 1, 'b': 2}]) df.to_parquet('s3://my_s3_bucket/x.parquet', engine='pyarrow') print('OK 1') df.to_parquet('s3://my_s3_bucket/x2.parquet', partition_cols=['a'], engine='pyarrow') print('OK 2')
Output:
3.5.2 (default, Feb 14 2019, 01:46:27) [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)] 0.24.2 OK 1 Traceback (most recent call last): File "./t.py", line 14, in <module> df.to_parquet('s3://my_s3_bucket/x2.parquet', partition_cols=['a'], engine='pyarrow') File "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pandas/core/frame.py", line 2203, in to_parquet partition_cols=partition_cols, **kwargs) File "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pandas/io/parquet.py", line 252, in to_parquet partition_cols=partition_cols, **kwargs) File "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pandas/io/parquet.py", line 118, in write partition_cols=partition_cols, **kwargs) File "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pyarrow/parquet.py", line 1227, in write_to_dataset _mkdir_if_not_exists(fs, root_path) File "/Users/vshih/.pyenv/versions/3.5.2/lib/python3.5/site-packages/pyarrow/parquet.py", line 1182, in _mkdir_if_not_exists if fs._isfilestore() and not fs.exists(path): AttributeError: 'NoneType' object has no attribute '_isfilestore'
Original issue - https://github.com/apache/arrow/issues/4030