Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
When reading in a CSV with headers, and also using a schema, we get an error as the code tries to read in the header as a line of data.
share_data <- tibble::tibble( company = c("AMZN", "GOOG", "BKNG", "TSLA"), price = c(3463.12, 2884.38, 2300.46, 732.39) ) readr::write_csv(share_data, file = "share_data.csv") share_schema <- schema( company = utf8(), price = float64() ) read_csv_arrow("share_data.csv", schema = share_schema)
Error: Invalid: In CSV column #1: CSV conversion error to double: invalid value 'price' /home/nic2/arrow/cpp/src/arrow/csv/converter.cc:492 decoder_.Decode(data, size, quoted, &value) /home/nic2/arrow/cpp/src/arrow/csv/parser.h:84 status /home/nic2/arrow/cpp/src/arrow/csv/converter.cc:496 parser.VisitColumn(col_index, visit)
The correct thing here would have been for the user to supply the argument skip=1 to read_csv_arrow() but this is not immediately obvious from the error message returned from C++. We should capture the error and instead supply our own error message using rlang::abort which informs the user of the error and then suggests what they can do to prevent it.
For similar examples (and their associated PRs) see ARROW-11766, and ARROW-12791
Attachments
Issue Links
- links to