Details
Description
While the following snippet works with arrow 3.0.0, it fails after updating to arrow 4.0.0.
An example CSV that can be used to replicate this can be found here
. ├── data │ └── 2021-04-25-Karlen-pypm.csv └── test.R
library(arrow) library(tidyverse) sch <- schema(forecast_date=string(), target=string(), target_end_date=string(), location=string(), type=string(), quantile=string(), value=string()) ds = open_dataset("data", format = "csv", schema = sch) ds %>% select(target) %>% collect()
The error is:
Error: Invalid: In CSV column #3: CSV conversion error to int64: invalid value 'US'
However, it should be noted that these all run well and return a data frame with the right schema.
ds %>% collect() ds %>% select(target, location) %>% collect()
Attachments
Issue Links
- duplicates
-
ARROW-12500 [C++][Dataset] Consolidate similar tests for file formats
-
- Resolved
-