Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7825

[R] Update docs to clarify that stringsAsFactors isn't relevant for parquet/feather

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 0.16.0
    • None
    • R
    • Linux 64-bit 5.4.15

    Description

      Same issue as reported for feather::read_feather (https://issues.apache.org/jira/browse/ARROW-7823);

       

      For the R arrow package, the "read_parquet()" function currently does not respect "options(stringsAsFactors = FALSE)", leading to unexpected/inconsistent behavior.

       

      Example:

       

       

      library(arrow)
      library(readr)
      options(stringsAsFactors = FALSE)
      write_tsv(head(iris), 'test.tsv')
      write_parquet(head(iris), 'test.parquet')
      head(read.delim('test.tsv', sep='\t')$Species)
      # [1] "setosa" "setosa" "setosa" "setosa" "setosa" "setosa"
      head(read_tsv('test.tsv', col_types = cols())$Species)
      # [1] "setosa" "setosa" "setosa" "setosa" "setosa" "setosa"
      head(read_parquet('test.parquet')$Species)
      # [1] setosa setosa setosa setosa setosa setosa
      # Levels: setosa versicolor virginica
      

       

       

      Versions:

      • R 3.6.2
      • arrow_0.15.1.9000

      Attachments

        Issue Links

          Activity

            People

              npr Neal Richardson
              khughitt Keith Hughitt
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: