Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
The command to open a dataset in R can accept both a schema and a partitioning argument. If one accidentally passes a partitioning as the schema, the result looks like the dataset was read, but operating on the dataset results in segfaults after.
Though this is input error, we should add a validation checking that the schema argument is, in fact, a Schema object and error if it is not so that someone doesn't find themselves confronted with a segfault later.
### begin setup # note: this exact code is called in test-dataset.R lines 18-87) So when adding # the test to that file, you don't need to copy this, but can use the code at # the bottom of this chunk in that test if you want. library(dplyr) make_temp_dir <- function() { path <- tempfile() dir.create(path) normalizePath(path, winslash = "/") } hive_dir <- make_temp_dir() first_date <- lubridate::ymd_hms("2015-04-29 03:12:39") df1 <- tibble( int = 1:10, dbl = as.numeric(1:10), lgl = rep(c(TRUE, FALSE, NA, TRUE, FALSE), 2), chr = letters[1:10], fct = factor(LETTERS[1:10]), ts = first_date + lubridate::days(1:10) ) second_date <- lubridate::ymd_hms("2017-03-09 07:01:02") df2 <- tibble( int = 101:110, dbl = c(as.numeric(51:59), NaN), lgl = rep(c(TRUE, FALSE, NA, TRUE, FALSE), 2), chr = letters[10:1], fct = factor(LETTERS[10:1]), ts = second_date + lubridate::days(10:1) ) dir.create(file.path(hive_dir, "subdir", "group=1", "other=xxx"), recursive = TRUE) dir.create(file.path(hive_dir, "subdir", "group=2", "other=yyy"), recursive = TRUE) write_parquet(df1, file.path(hive_dir, "subdir", "group=1", "other=xxx", "file1.parquet")) write_parquet(df2, file.path(hive_dir, "subdir", "group=2", "other=yyy", "file2.parquet")) ### end setup # This (the correct specification) works just fine ds <- open_dataset(hive_dir, partitioning = hive_partition(other = utf8(), group = uint8())) ds$schema # But if you aren't explicit with ther argument names it looks like everything works... ds <- open_dataset(hive_dir, hive_partition(other = utf8(), group = uint8())) # but the dataset is malformed and will have segfaults when trying to interact with it for example: ds$schema
Attachments
Issue Links
- links to