Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7018

[R] Non-UTF-8 data in Arrow <--> R conversion

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 0.15.0
    • 1.0.0
    • R
    • I'm running R on Windows 10

    Description

      Hello.
      I'm new to the arrow package in R and I'm having a trouble regarding special characters (Icelandic). I have a large data set and everything is fine until I write the file to disk and read it in again (i.e. I use write_parquet() and then read_parquet()). When I read the data back in to R special characters turn into question mark. I.e. Veitingastaðir becomes Veitingasta�ir.

      This does not happen when I use .csv.

      Is there anything I can do when I write the .parquet file to disk or when I read it in to prevent this?

      Attachments

        Issue Links

          Activity

            People

              npr Neal Richardson
              vidaringa Vidar Ingason
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h