Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10485

[R] Accept partitioning in open_dataset when file paths are hive-style

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.0.0
    • 7.0.0
    • R
    • MacOS Catalina 10.15.7 (19H2), R 4.01, arrow R package v2.0.0

    Description

      When writing a dataset with hive_style = TRUE, now the default, that dataset has to be opened without an explicit definition of the partitions to work as expected. Even if the correct partition is specified, any query to the dataset on the partition field returns 0 rows.

       

      From my eyes as a user, I'd want this to error out specifically (not just warn), probably when first calling open_dataset().

      data("mtcars")
      arrow::write_dataset(
          dataset = mtcars, path = "mtcarstest", partitioning = "cyl",
          format = "parquet", hive_style = TRUE)
      
      mtc1 <- arrow::open_dataset("mtcarstest", partitioning = "cyl")
      mtc2 <- arrow::open_dataset("mtcarstest")
      
      mtc1 %>%
           dplyr::filter(cyl == 4) %>%
           collect()
      
      mtc2 %>%
           dplyr::filter(cyl == 4) %>%
           collect()
      
       

      Attachments

        Issue Links

          Activity

            People

              npr Neal Richardson
              jms John Sheffield
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 50m
                  3h 50m