Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-16720

[R] Cannot read datasets partitioned by columns starting with dots

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 8.0.0
    • 9.0.0
    • R
    • None

    Description

      As in the title.
      It might be due to the fact that files starting with dots are hidden.
      No issues if the dot appears elsewhere.

      Reprex:

      library(dplyr)
      library(arrow)
      
      packageVersion("arrow")
      #> [1] '8.0.0'
      
      path_arrow_tmp <- tempfile()
      
      mtcars %>% 
         dplyr::group_by(cyl) %>% 
         arrow::write_dataset(
            path = path_arrow_tmp
         )
      
      base::list.files(path_arrow_tmp, recursive = TRUE, all.files = TRUE)
      #> [1] "cyl=4/part-0.parquet" "cyl=6/part-0.parquet" "cyl=8/part-0.parquet"
      
      mtcars_load <- path_arrow_tmp %>% 
         arrow::open_dataset() %>% 
         dplyr::collect()
      
      setequal(mtcars$mpg, mtcars_load$mpg)
      #> [1] TRUE
      
      # Change grouping by ".cyl"
      
      path_arrow_tmp_grp <- tempfile()
      
      mtcars %>% 
         dplyr::mutate(.cyl = cyl) %>% 
         dplyr::group_by(.cyl) %>% 
         arrow::write_dataset(
            path = path_arrow_tmp_grp
         )
      
      # the files are there
      base::list.files(path_arrow_tmp_grp, recursive = TRUE, all.files = TRUE)
      #> [1] ".cyl=4/part-0.parquet" ".cyl=6/part-0.parquet" ".cyl=8/part-0.parquet"
      
      # 0 files detected
      path_arrow_tmp_grp %>% 
         arrow::open_dataset()
      #> FileSystemDataset with 0 Parquet files
      
      # Specify partitioning manually
      # still no files
      
      path_arrow_tmp_grp %>% 
         arrow::open_dataset(
            partitioning = ".cyl",
            hive_style = TRUE
         )
      #> FileSystemDataset with 0 Parquet files
      #> .cyl: int32
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              LorenzoGbr Lorenzo Gaborini
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: