Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13588

[R] Empty character attributes not stored

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 5.0.0
    • None
    • R
    • Ubuntu 20.04 R 4.1 release

    Description

      Date-times in the POSIXct format have a 'tzone' attribute that by default is set to "", an empty character vector (not NULL) when created.

      This however is not stored in the Arrow feather file. When the file is read back, the original and restored dataframes are not identical as per the below reprex.

      I am thinking that this should not be the intention? My workaround at the moment is making a check when reading back to write the empty string if the tzone attribute does not exist.

      Just to confirm, the attribute is stored correctly when it is not empty.

      Thanks.

      ``` r
       dates <- as.POSIXct(c("2020-01-01", "2020-01-02", "2020-01-02"))
       attributes(dates)
       #> $class
       #> [1] "POSIXct" "POSIXt" 
       #> 
       #> $tzone
       #> [1] ""
      
       values <- c(1:3)
       original <- data.frame(dates, values)
       original
       #> dates values
       #> 1 2020-01-01 1
       #> 2 2020-01-02 2
       #> 3 2020-01-02 3
      
      tempfile <- tempfile()
      arrow::write_feather(original, tempfile)
      
      restored <- arrow::read_feather(tempfile)
      
      identical(original, restored)
       #> [1] FALSE
      
       waldo::compare(original, restored)
       #> `attr(old$dates, 'tzone')` is a character vector ('')
       #> `attr(new$dates, 'tzone')` is absent
      
      unlink(tempfile)
       ```
      

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            shikokuchuo Charlie Gao
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: