Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
2.0.0
-
MacOS Catalina 10.15.7 (19H2), R 4.01, arrow R package v2.0.0
Description
When writing a dataset with hive_style = TRUE, now the default, that dataset has to be opened without an explicit definition of the partitions to work as expected. Even if the correct partition is specified, any query to the dataset on the partition field returns 0 rows.
From my eyes as a user, I'd want this to error out specifically (not just warn), probably when first calling open_dataset().
data("mtcars") arrow::write_dataset( dataset = mtcars, path = "mtcarstest", partitioning = "cyl", format = "parquet", hive_style = TRUE) mtc1 <- arrow::open_dataset("mtcarstest", partitioning = "cyl") mtc2 <- arrow::open_dataset("mtcarstest") mtc1 %>% dplyr::filter(cyl == 4) %>% collect() mtc2 %>% dplyr::filter(cyl == 4) %>% collect()
Attachments
Issue Links
- fixes
-
ARROW-14743 [C++] Error reading in dataset when partitioning variable in schema
- Resolved
- relates to
-
ARROW-15310 [C++][Python][Dataset] Detect (and warn?) when DirectoryPartitioning is parsing an actually hive-style file path?
- Open
- links to