Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-3543

[R] Time zone adjustment issue when reading Feather file written by Python

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 1.0.0
    • Component/s: R
    • Labels:
      None

      Description

      Hello the dream team,

      Pasting from https://github.com/wesm/feather/issues/351

      Thanks for this wonderful package. I was playing with feather and some timestamps and I noticed some dangerous behavior. Maybe it is a bug.

      Consider this

       

      import pandas as pd
      import feather
      import numpy as np
      df = pd.DataFrame(
      {'string_time_utc' : [pd.to_datetime('2018-02-01 14:00:00.531'), pd.to_datetime('2018-02-01 14:01:00.456'), pd.to_datetime('2018-03-05 14:01:02.200')]}
      )
      df['timestamp_est'] = pd.to_datetime(df.string_time_utc).dt.tz_localize('UTC').dt.tz_convert('US/Eastern').dt.tz_localize(None)
      df
       Out[17]: 
       string_time_utc timestamp_est
       0 2018-02-01 14:00:00.531 2018-02-01 09:00:00.531
       1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
       2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
      

      Here I create the corresponding `EST` timestamp of my original timestamps (in `UTC` time).

      Now saving the dataframe to `csv` or to `feather` will generate two completely different results.

       

      df.to_csv('P://testing.csv')
      df.to_feather('P://testing.feather')
      

      Switching to R.

      Using the good old `csv` gives me something a bit annoying, but expected. R thinks my timezone is `UTC` by default, and wrongly attached this timezone to `timestamp_est`. No big deal, I can always use `with_tz` or even better: import as character and process as timestamp while in R.

       

      > dataframe <- read_csv('P://testing.csv')
       Parsed with column specification:
       cols(
       X1 = col_integer(),
       string_time_utc = col_datetime(format = ""),
       timestamp_est = col_datetime(format = "")
       )
       Warning message:
       Missing column names filled in: 'X1' [1] 
       > 
       > dataframe %>% mutate(mytimezone = tz(timestamp_est))
      
      A tibble: 3 x 4
       X1 string_time_utc timestamp_est 
       <int> <dttm> <dttm> 
       1 0 2018-02-01 14:00:00.530 2018-02-01 09:00:00.530
       2 1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
       3 2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
       mytimezone
       <chr> 
       1 UTC 
       2 UTC 
       3 UTC  
      #Now look at what happens with feather:
       
       > dataframe <- read_feather('P://testing.feather')
       > 
       > dataframe %>% mutate(mytimezone = tz(timestamp_est))
      
      A tibble: 3 x 3
       string_time_utc timestamp_est mytimezone
       <dttm> <dttm> <chr> 
       1 2018-02-01 09:00:00.531 2018-02-01 04:00:00.531 "" 
       2 2018-02-01 09:01:00.456 2018-02-01 04:01:00.456 "" 
       3 2018-03-05 09:01:02.200 2018-03-05 04:01:02.200 "" 

      My timestamps have been converted!!! pure insanity.
      Am I missing something here?

      Thanks!!

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Olafsson Olaf
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated: