Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
After ARROW-12820, the strptime kernel still ignores the %Z specifier (for timezone names), and when using it, it basically ignores any string.
For example:
# the %z specifier now works (after ARROW-12820) >>> pc.strptime(["2022-03-05 09:00:00+01"], format="%Y-%m-%d %H:%M:%S%z", unit="us") <pyarrow.lib.TimestampArray object at 0x7f00c1dd21c0> [ 2022-03-05 08:00:00.000000 ] # in theory this should give the same result, but %Z is still ignore >>> pc.strptime(["2022-03-05 09:00:00 CET"], format="%Y-%m-%d %H:%M:%S %Z", unit="us") <pyarrow.lib.TimestampArray object at 0x7f00c86d1ca0> [ 2022-03-05 09:00:00.000000 ] # as a result any garbage in the string is also ignored >>> pc.strptime(["2022-03-05 09:00:00 blabla"], format="%Y-%m-%d %H:%M:%S %Z", unit="us") <pyarrow.lib.TimestampArray object at 0x7f00c1db1ca0> [ 2022-03-05 09:00:00.000000 ]
I don't think it is easy to actually fix this (at least as long as we use the system strptime, see also https://github.com/apache/arrow/pull/11358#issue-1020404727). But at least we should document this limitation / gotcha.
Attachments
Issue Links
- is a child of
-
ARROW-15894 [C++] Strptime issues umbrella
- Open
- is related to
-
ARROW-12820 [C++] Strptime ignores timezone information
- Resolved