Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
It's not all that clear from our docs that if we want to read in a Parquet file and change the schema, we need to call the cast() method on the Table, e.g.
# Write out data data <- tibble::tibble(x = c(letters[1:5], NA), y = 1:6) data_with_schema <- arrow_table(data, schema = schema(x = string(), y = int64())) write_parquet(data_with_schema, "data_with_schema.parquet") # Read in data while specifying a schema data_in <- read_parquet("data_with_schema.parquet", as_data_frame = FALSE) data_in$cast(target_schema = schema(x = string(), y = int32()))
We should document this more clearly. Pehaps we could even update the code here to automatically do some of this if we pass in a schema to the ... argument of read_parquet and the returned data doesn't match the desired schema?