Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
With regarding schema evolution / normalization, we support inserting nulls for a missing column or changing nullability, or normalizing column order, but we do not yet seem to support promotion of null type to any other type.
Small python example:
In [11]: df = pd.DataFrame({"col": np.array([None, None, None, None], dtype='object')}) ...: df.to_parquet("test_filter_schema.parquet", engine="pyarrow") ...: ...: import pyarrow.dataset as ds ...: dataset = ds.dataset("test_filter_schema.parquet", format="parquet", schema=pa.schema([("col", pa.int64())])) ...: dataset.to_table() ... ~/scipy/repos/arrow/python/pyarrow/_dataset.pyx in pyarrow._dataset.Dataset.to_table() ~/scipy/repos/arrow/python/pyarrow/_dataset.pyx in pyarrow._dataset.Scanner.to_table() ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status() ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status() ArrowTypeError: fields had matching names but differing types. From: col: null To: col: int64
Attachments
Issue Links
- is depended upon by
-
ARROW-2659 [Python] More graceful reading of empty String columns in ParquetDataset
- Open
-
ARROW-2860 [Python][Parquet][C++] Null values in a single partition of Parquet dataset, results in invalid schema on read
- Open
- links to