Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
6.0.0
-
macOS Mojave, R 4.1.1
Description
DragosMG: I believe this is a bug that should be fixed in the C++ code as there isn't an option we could leverage on the R side.
I have draft PR with a failing test, but it's identical to Andy's reproducible example below.
Original description below:
======================
When a CSV file starts with byte order mark, arrow::open_dataset() reads the file but populates the first column with NA values. It appears a similar issue was raised and fixed here: https://issues.apache.org/jira/browse/ARROW-5413. read_csv_arrow() deals with the BOM correctly.
Reproducible Example:
library(arrow) library(dplyr) writeLines('\xef\xbb\xbfa,b\n1,2\n', con = "testfile.csv") read_csv_arrow("testfile.csv") # works #> # A tibble: 1 × 2 #> a b #> <int> <int> #> 1 1 2 open_dataset("testfile.csv", format = "csv") |> collect() #> # A tibble: 1 × 2 #> a b #> <int> <int> #> 1 NA 2
Attachments
Issue Links
- causes
-
ARROW-15041 [R] Flaky BOM removal test
- Resolved
- links to