[ARROW-10485] [R] Accept partitioning in open_dataset when file paths are hive-style - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.0.0
Fix Version/s: 7.0.0
Component/s: R
Labels:
- pull-request-available
Environment:
MacOS Catalina 10.15.7 (19H2), R 4.01, arrow R package v2.0.0

External issue URL:
https://github.com/apache/arrow/issues/26460

Description

When writing a dataset with hive_style = TRUE, now the default, that dataset has to be opened without an explicit definition of the partitions to work as expected. Even if the correct partition is specified, any query to the dataset on the partition field returns 0 rows.

From my eyes as a user, I'd want this to error out specifically (not just warn), probably when first calling open_dataset().

data("mtcars")
arrow::write_dataset(
    dataset = mtcars, path = "mtcarstest", partitioning = "cyl",
    format = "parquet", hive_style = TRUE)

mtc1 <- arrow::open_dataset("mtcarstest", partitioning = "cyl")
mtc2 <- arrow::open_dataset("mtcarstest")

mtc1 %>%
     dplyr::filter(cyl == 4) %>%
     collect()

mtc2 %>%
     dplyr::filter(cyl == 4) %>%
     collect()

Attachments

Issue Links

fixes

ARROW-14743 [C++] Error reading in dataset when partitioning variable in schema

Resolved

relates to

ARROW-15310 [C++][Python][Dataset] Detect (and warn?) when DirectoryPartitioning is parsing an actually hive-style file path?

Open

links to

GitHub Pull Request #12133

Activity

People

Assignee:: Neal Richardson

Reporter:: John Sheffield

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 03/Nov/20 20:46

Updated:: 11/Jan/23 08:13

Resolved:: 13/Jan/22 22:36

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

3h 50m