Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-18266

[R] Make it more obvious how to read in a Parquet file with a different schema to the inferred one

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • R
    • None

    Description

      It's not all that clear from our docs that if we want to read in a Parquet file and change the schema, we need to call the cast() method on the Table, e.g.

      # Write out data
      data <- tibble::tibble(x = c(letters[1:5], NA), y = 1:6)
      data_with_schema <- arrow_table(data, schema = schema(x = string(), y = int64()))
      write_parquet(data_with_schema, "data_with_schema.parquet")
      
      # Read in data while specifying a schema
      data_in <- read_parquet("data_with_schema.parquet", as_data_frame = FALSE)	
      data_in$cast(target_schema = schema(x = string(), y = int32()))
      

      We should document this more clearly. Pehaps we could even update the code here to automatically do some of this if we pass in a schema to the ... argument of read_parquet and the returned data doesn't match the desired schema?

      Attachments

        Activity

          People

            Unassigned Unassigned
            thisisnic Nicola Crane
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: