Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
8.0.0
Description
Our current minimal_build examples for python build with -DARROW_PARQUET=ON but without DATASET. This produces the following failure:
_________________________________________________________ test_partitioned_dataset[True] _________________________________________________________tempdir = PosixPath('/tmp/pytest-of-root/pytest-0/test_partitioned_dataset_True_0'), use_legacy_dataset = True @pytest.mark.pandas @parametrize_legacy_dataset def test_partitioned_dataset(tempdir, use_legacy_dataset): # ARROW-3208: Segmentation fault when reading a Parquet partitioned dataset # to a Parquet file path = tempdir / "ARROW-3208" df = pd.DataFrame({ 'one': [-1, 10, 2.5, 100, 1000, 1, 29.2], 'two': [-1, 10, 2, 100, 1000, 1, 11], 'three': [0, 0, 0, 0, 0, 0, 0] }) table = pa.Table.from_pandas(df) > pq.write_to_dataset(table, root_path=str(path), partition_cols=['one', 'two'])pyarrow/tests/parquet/test_dataset.py:1544: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pyarrow/parquet/__init__.py:3110: in write_to_dataset import pyarrow.dataset as ds _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ """Dataset is currently unstable. APIs subject to change without notice.""" import pyarrow as pa from pyarrow.util import _is_iterable, _stringify_path, _is_path_like > from pyarrow._dataset import ( # noqa CsvFileFormat, CsvFragmentScanOptions, Dataset, DatasetFactory, DirectoryPartitioning, FilenamePartitioning, FileFormat, FileFragment, FileSystemDataset, FileSystemDatasetFactory, FileSystemFactoryOptions, FileWriteOptions, Fragment, FragmentScanOptions, HivePartitioning, IpcFileFormat, IpcFileWriteOptions, InMemoryDataset, Partitioning, PartitioningFactory, Scanner, TaggedRecordBatch, UnionDataset, UnionDatasetFactory, _get_partition_keys, _filesystemdataset_write, ) E ModuleNotFoundError: No module named 'pyarrow._dataset'
This can be reproduced via running the minimal_build examples:
$ cd arrow/python/examples/minimal_build $ docker build -t arrow_ubuntu_minimal -f Dockerfile.ubuntu .
or via building arrow and pyarrow with PARQUET but without DATASET.
Attachments
Issue Links
- relates to
-
ARROW-16582 [Python] Include DATASET in list of components in PyArrow's dev page
- Resolved
- links to