Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-8726

[C++][Dataset] Mis-specified DirectoryPartitioning incorrectly uses the file name as value

    XMLWordPrintableJSON

    Details

      Description

      Calling filter + collect on a dataset with a mis-specified partitioning causes a segfault. Though this is clearly input error, it would be nice if there was some guidance that something was wrong with the partitioning.

      library(arrow)
      library(dplyr)
      
      dir.create("multi_mtcars/one", recursive = TRUE)
      dir.create("multi_mtcars/two", recursive = TRUE)
      write_parquet(mtcars, "multi_mtcars/one/mtcars.parquet")
      write_parquet(mtcars, "multi_mtcars/two/mtcars.parquet")
      
      ds <- open_dataset("multi_mtcars", partitioning = c("level", "nothing"))
      
      # the following will segfault
      ds %>%
        filter(cyl > 8) %>% 
        collect()
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                fsaintjacques Francois Saint-Jacques
                Reporter:
                jonkeane Jonathan Keane
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h