Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-15884

[C++][Doc] Document that the strptime kernel ignores %Z

    XMLWordPrintableJSON

Details

    Description

      After ARROW-12820, the strptime kernel still ignores the %Z specifier (for timezone names), and when using it, it basically ignores any string.

      For example:

      # the %z specifier now works (after ARROW-12820)
      >>> pc.strptime(["2022-03-05 09:00:00+01"], format="%Y-%m-%d %H:%M:%S%z", unit="us")
      <pyarrow.lib.TimestampArray object at 0x7f00c1dd21c0>
      [
        2022-03-05 08:00:00.000000
      ]
      
      # in theory this should give the same result, but %Z is still ignore
      >>> pc.strptime(["2022-03-05 09:00:00 CET"], format="%Y-%m-%d %H:%M:%S %Z", unit="us")
      <pyarrow.lib.TimestampArray object at 0x7f00c86d1ca0>
      [
        2022-03-05 09:00:00.000000
      ]
      
      # as a result any garbage in the string is also ignored
      >>> pc.strptime(["2022-03-05 09:00:00 blabla"], format="%Y-%m-%d %H:%M:%S %Z", unit="us")
      <pyarrow.lib.TimestampArray object at 0x7f00c1db1ca0>
      [
        2022-03-05 09:00:00.000000
      ]
      

      I don't think it is easy to actually fix this (at least as long as we use the system strptime, see also https://github.com/apache/arrow/pull/11358#issue-1020404727). But at least we should document this limitation / gotcha.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: