Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12603

[R] open_dataset ignoring provided schema when using select

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 4.0.0
    • 4.0.1
    • R
    • None
    • R version 4.0.5 (2021-03-31)
      Platform: x86_64-pc-linux-gnu (64-bit)

    Description

      While the following snippet works with arrow 3.0.0, it fails after updating to arrow 4.0.0.

      An example CSV that can be used to replicate this can be found here

      .
      ├── data
      │   └── 2021-04-25-Karlen-pypm.csv
      └── test.R
      
      library(arrow)
      library(tidyverse)
      
      sch <- schema(forecast_date=string(),
       target=string(),
       target_end_date=string(),
       location=string(),
       type=string(),
       quantile=string(),
       value=string())
      
      ds = open_dataset("data", format = "csv", schema = sch)
      
      ds %>% select(target) %>% collect()
      

      The error is:
      Error: Invalid: In CSV column #3: CSV conversion error to int64: invalid value 'US'

      However, it should be noted that these all run well and return a data frame with the right schema.

      ds %>% collect()
      ds %>% select(target, location) %>% collect()
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              chuaeujing Eu Jing Chua
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: