Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-14939

[R] Problem with new variables in dataset schema

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Resolved
    • 6.0.1
    • None
    • None
    • None

    Description

      Hi, 

      I have a problem with updating the schema in arrow::open_dataset().

      For example, let's say I have one parquet file with two columns (a and b) and another file with three columns (a and b and c). When I open this dataset, its schema will only detect columns a and b. Am I missing something ? From my previous experience, I already added new columns to some Parquet files which did not exist in other files and the new columns were automatically added to my schema, which was great.

      Hereafter you will find the code to replicate my issue :

       

      df = data.frame(a= 1,
                      b= 2)
       df_2 = data.frame(a = 2,
                        b = 3,
                        c = 4)
      write_parquet(df, "C:/Data/test2/df1.parquet")
      write_parquet(df_2, "C:/Data/test2/df2.parquet")
      ds <- arrow::open_dataset(sources = "C:/Data/test2") ; ds_cols <- data.frame(variables = ds$ schema$ names)
      ds
      

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            palgal Pal
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: