Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7825

[R] Update docs to clarify that stringsAsFactors isn't relevant for parquet/feather

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.16.0
    • Fix Version/s: None
    • Component/s: R
    • Labels:
    • Environment:
      Linux 64-bit 5.4.15

      Description

      Same issue as reported for feather::read_feather (https://issues.apache.org/jira/browse/ARROW-7823);

       

      For the R arrow package, the "read_parquet()" function currently does not respect "options(stringsAsFactors = FALSE)", leading to unexpected/inconsistent behavior.

       

      Example:

       

       

      library(arrow)
      library(readr)
      options(stringsAsFactors = FALSE)
      write_tsv(head(iris), 'test.tsv')
      write_parquet(head(iris), 'test.parquet')
      head(read.delim('test.tsv', sep='\t')$Species)
      # [1] "setosa" "setosa" "setosa" "setosa" "setosa" "setosa"
      head(read_tsv('test.tsv', col_types = cols())$Species)
      # [1] "setosa" "setosa" "setosa" "setosa" "setosa" "setosa"
      head(read_parquet('test.parquet')$Species)
      # [1] setosa setosa setosa setosa setosa setosa
      # Levels: setosa versicolor virginica
      

       

       

      Versions:

      • R 3.6.2
      • arrow_0.15.1.9000

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                npr Neal Richardson
                Reporter:
                khughitt Keith Hughitt
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: