Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Duplicate
-
1.0.1
-
None
-
macOS 10.15.7, Python 3.8.2
Description
Hi,
I'm working with parquet files generated by a AWS RDS Postgres snapshot export.
I'm trying to parse a date column stored as a string into a timestamp, but it fails.
I've managed to parse the same date format (as in the first example below) when reading from a csv, so I tried to investigate it as far as I could on my own, and here's my results:
import pyarrow as pa import pytz ################################################################################# ## the format I get from the database us_tz_arr = pa.array([ "2014-12-07 07:48:59.285332+00", "2014-12-07 08:01:49.758975+00", "2014-12-07 10:11:35.884304+00"]) us_tz_arr.cast(pa.timestamp('us', tz=pytz.UTC)) -> ArrowInvalid: Failed to parse string: 2014-12-07 10:11:35.884304+00 ################################################################################# ## tried removing the timezone us_arr = pa.array([ "2014-12-07 07:48:59.285332", "2014-12-07 08:01:49.758975", "2014-12-07 10:11:35.884304"]) us_arr.cast(pa.timestamp('us')) -> ArrowInvalid: Failed to parse string: 2014-12-07 10:11:35.884304 ################################################################################# ## tried removing the microseconds but keeping the timezone second_tz_arr = pa.array([ "2014-12-07 07:48:59+00", "2014-12-07 08:01:49+00", "2014-12-07 10:11:35+00"]) second_tz_arr.cast(pa.timestamp('s', tz=pytz.UTC)) -> ArrowInvalid: Failed to parse string: 2014-12-07 10:11:35+00 ################################################################################# ## removing microseconds and timezone, makes it work! s_arr = pa.array([ "2014-12-07 07:48:59", "2014-12-07 08:01:49", "2014-12-07 10:11:35"]) s_arr.cast(pa.timestamp('s')) -> <pyarrow.lib.TimestampArray object at 0x7fbdf81ae460> [ 2014-12-07 07:48:59, 2014-12-07 08:01:49, 2014-12-07 10:11:35 ]
PS. This is my first bug report, so apologies if important things are missing.
Attachments
Issue Links
- is fixed by
-
ARROW-12820 [C++] Strptime ignores timezone information
- Resolved
- is related to
-
ARROW-13625 [C++][CSV] Timestamp parsing should accept any valid ISO 8601 without requiring custom parse strings
- Resolved