Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-18181 [R] read_csv_arrow() Improvements
  3. ARROW-15992

[R] Supper encoding options for CSVs in open_dataset

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • R
    • None

    Description

      The encoding options are passed when a single file is read with read_delim_arrow, but not when opening a folder with open_dataset.

      read_delim_arrow creates a reader using CsvTableReader$create (which is what is tested in the package's tests).

      open_dataset creates a factory and I'm unable to follow what happens when $Finish() is called.

       

      Also, the documentation ("CsvReadOptions" page) lists the "encoding" option under "CsvConvertOptions$create()" instead of "CsvReadOptions$create()"

       

      library(dplyr)
      library(arrow)
      # Opens one file just fine:
      one_file <- arrow::read_delim_arrow(
        "test/Test1.txt", 
        as_data_frame = FALSE,
        delim = ";",
        read_options = CsvReadOptions$create(encoding = "ISO-8859-1")
      )
      collect(one_file)
       
      # Can't open the folder that has "Test1.txt" properly, results in Column2 being typed as binary
      one_folder <- arrow::open_dataset(
        "test", 
        delim = ";",
        read_options = CsvReadOptions$create(encoding = "ISO-8859-1")
      )
      collect(one_folder)
       
      # Even when specify the schema
      one_folder_w_schema <- arrow::open_dataset(
        "test", 
        schema = Schema$create(Column1 = string(), Column2 = string()),
        format = FileFormat$create("text", skip_rows = 1L, delimiter = ";", column_names = c("Column1", "Column2"),
                                   read_options = CsvReadOptions$create(encoding = "ISO-8859-1"))
        
      )
      collect(one_folder_w_schema) 

       

      Attachments

        1. Test1.txt
          0.1 kB
          Gregoire Leleu

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gregleleu Gregoire Leleu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: