Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
3.0.0
-
Windows
Description
I have a directory of split .csvs that I'm importing with open_dataset(). Between files, a column is imported as either int64 (e.g. -2) and the other string (1986CD), and this throws an error when unify_schemas = T
{{ arrow::open_dataset('./split-csvs/nswcr/', format = 'csv', unify_schemas = T)}}
Error: Invalid: Unable to merge: Field SEIFACalcMethod has incompatible types: int64 vs string
If I use the schema parameter, and only want to specify this column, I only am able to import this column
arrow::open_dataset('./split-csvs/nswcr/', format = 'csv', schema = schema(SEIFACalcMethod = string()))
{{ }}
FileSystemDataset with 45 csv files
SEIFACalcMethod: string
I was expecting that could set the class of a select few columns, while the rest would be imported as-is. Similar to readr::read_csv(col_types = cols()) approach.
Not sure if this is expected behaviour, a bug, or a possible avenue for improvement. I've tagged this as the latter.
Attachments
Issue Links
- links to