Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
7.0.0
Description
Also out of discussion on https://github.com/apache/arrow/issues/12371
You can unify schemas between different parquet files, but it seems like you can't union together two (or more) datasets that have different schemas. This is odd, because we do compute the unified schema onĀ this line, only to later assert all the schemas are the same.
library(arrow) library(dplyr) df1 <- arrow_table(x = array(c(1, 2, 3)), y = array(c("a", "b", "c"))) df2 <- arrow_table(x = array(c(4, 5)), z = array(c("d", "e"))) df1 %>% write_dataset("example1", format="parquet") df2 %>% write_dataset("example2", format="parquet") ds1 <- open_dataset("example1", format="parquet") ds2 <- open_dataset("example2", format="parquet") # These don't work ds <- c(ds1, ds2) # c() actually does the same thing ds <- open_dataset(list(ds1, ds2)) # This fails due to mismatch in schema ds <- open_dataset(c("example1", "example2"), format="parquet", unify_schemas = TRUE) # This does ds <- open_dataset(c("example2/part-0.parquet", "example1/part-0.parquet"), format="parquet", unify_schemas = TRUE)
Attachments
Issue Links
- links to