Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
The encoding options are passed when a single file is read with read_delim_arrow, but not when opening a folder with open_dataset.
read_delim_arrow creates a reader using CsvTableReader$create (which is what is tested in the package's tests).
open_dataset creates a factory and I'm unable to follow what happens when $Finish() is called.
Also, the documentation ("CsvReadOptions" page) lists the "encoding" option under "CsvConvertOptions$create()" instead of "CsvReadOptions$create()"
library(dplyr) library(arrow) # Opens one file just fine: one_file <- arrow::read_delim_arrow( "test/Test1.txt", as_data_frame = FALSE, delim = ";", read_options = CsvReadOptions$create(encoding = "ISO-8859-1") ) collect(one_file) # Can't open the folder that has "Test1.txt" properly, results in Column2 being typed as binary one_folder <- arrow::open_dataset( "test", delim = ";", read_options = CsvReadOptions$create(encoding = "ISO-8859-1") ) collect(one_folder) # Even when specify the schema one_folder_w_schema <- arrow::open_dataset( "test", schema = Schema$create(Column1 = string(), Column2 = string()), format = FileFormat$create("text", skip_rows = 1L, delimiter = ";", column_names = c("Column1", "Column2"), read_options = CsvReadOptions$create(encoding = "ISO-8859-1")) ) collect(one_folder_w_schema)
Attachments
Attachments
Issue Links
- is blocked by
-
ARROW-16000 [C++][Dataset] Support Latin-1 encoding
- Resolved