Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2814

HttpDateFormat's internal time zone may change after parsing a date

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.17
    • Fix Version/s: 1.18
    • Component/s: protocol
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      In the Common Crawl WARC files I've observed that the If-modified-since header is sent in varying time zones:

      If-Modified-Since: Tue, 25 Feb 2020 03:33:21 MSK
      If-Modified-Since: Sun, 22 Sep 2019 04:41:48 GMT
      If-Modified-Since: Mon, 18 Nov 2019 12:06:19 KRAT
      If-Modified-Since: Tue, 21 Jan 2020 02:10:22 UTC
      If-Modified-Since: Fri, 18 Oct 2019 20:23:57 BST
      If-Modified-Since: Sun, 20 Oct 2019 08:39:26 CEST
      If-Modified-Since: Fri, 15 Nov 2019 12:56:38 EST
      If-Modified-Since: Mon, 30 Mar 2020 09:10:33 GMT
      If-Modified-Since: Mon, 30 Mar 2020 05:18:36 GMT
      If-Modified-Since: Fri, 28 Feb 2020 03:09:16 PST
      If-Modified-Since: Thu, 21 Nov 2019 10:16:19 YEKT
      If-Modified-Since: Thu, 14 Nov 2019 18:01:05 EET
      If-Modified-Since: Thu, 14 Nov 2019 16:46:43 UTC
      If-Modified-Since: Sun, 17 Nov 2019 13:14:28 UTC
      If-Modified-Since: Tue, 25 Feb 2020 21:46:10 GMT
      If-Modified-Since: Wed, 16 Oct 2019 19:03:31 UTC
      If-Modified-Since: Thu, 14 Nov 2019 09:07:13 EST
      If-Modified-Since: Thu, 09 Apr 2020 12:21:53 EEST
      If-Modified-Since: Sat, 28 Mar 2020 19:08:52 CET
      If-Modified-Since: Sun, 23 Feb 2020 12:22:46 CET
      If-Modified-Since: Mon, 21 Oct 2019 03:18:16 PDT
      If-Modified-Since: Fri, 15 Nov 2019 05:41:44 UTC
      If-Modified-Since: Thu, 09 Apr 2020 21:01:32 CEST
      If-Modified-Since: Wed, 11 Dec 2019 11:18:28 KRAT
      If-Modified-Since: Tue, 22 Oct 2019 18:55:54 GMT
      

      This actually happens because the time zone of HttpDateFormat's internal SimpleDateFormatter may change when a date is parsed. The next formatting uses the time zone of the last parsed date.

      The usage of "GMT" as time zone is specified in sec. 7.1.1.1 of RFC 7231.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                snagel Sebastian Nagel
                Reporter:
                snagel Sebastian Nagel
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: