Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.10.0, 0.11.0, 0.11.1
-
Python 3.6.3
OSX 10.14
Description
From: https://stackoverflow.com/questions/53214288/merging-parquet-files-pandas-meta-in-schema-mismatch
I am trying to merge multiple parquet files into one. Their schemas are identical field-wise but my ParquetWriter is complaining that they are not. After some investigation I found that the pandas meta in the schemas are different, causing this error.
Sample-
import pyarrow.parquet as pq pq_tables=[] for file_ in files: pq_table = pq.read_table(f'{MESS_DIR}/{file_}') pq_tables.append(pq_table) if writer is None: writer = pq.ParquetWriter(COMPRESSED_FILE, schema=pq_table.schema, use_deprecated_int96_timestamps=True) writer.write_table(table=pq_table)
The error-
Traceback (most recent call last): File "{PATH_TO}/main.py", line 68, in lambda_handler writer.write_table(table=pq_table) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyarrow/parquet.py", line 335, in write_table raise ValueError(msg) ValueError: Table schema does not match schema used to create file:
Attachments
Issue Links
- is duplicated by
-
ARROW-3918 [Python] ParquetWriter.write_table doesn't support coerce_timestamps or allow_truncated_timestamps
- Closed
- links to