Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6819

arrow::read_parquet ignores as_data_frame when sparklyr package is attached

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 0.15.0
    • None
    • R
    • None
    • R version 3.6.1 (2019-07-05) on x86_64, darwin15.6.0 (Mac OS 10.13.4)

    Description

      I am currently using v0.15.0 of the arrow package, installed from source using CRAN. I also have v1.0.4 of the sparklyr package installed. While attempting to read in Parquet data with both packages attached, the read_parquet function appears to ignore the as_data_frame argument (which defaults to TRUE).

      https://github.com/apache/arrow/blob/3d55122c56a508894823a1b79bca71f519fdd52f/r/R/parquet.R#L35-L47

      I am not certain, but I suspect the issue may be in the way Table__to_dataframe coerces Arrow Table objects into tibbles, since this statement appears also to produce a tibble (I expected a data.frame to be returned):

      arrow:::Table__to_dataframe(tab, use_threads=FALSE)

       

      A reproducible example follows.

       

      # This does work as expected, returns data.frame

      library(arrow)

      temp <- tempfile()
      download.file("https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata1.parquet?raw=true", temp)

      read_parquet(temp, as_data_frame=TRUE)

      # This does not work as expected, returns tibble

      library(sparklyr)

      read_parquet(temp, as_data_frame=TRUE) 

      Attachments

        Activity

          People

            npr Neal Richardson
            rpkyle Ryan Patrick Kyle
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: