Uploaded image for project: 'HttpComponents HttpClient'
  1. HttpComponents HttpClient
  2. HTTPCLIENT-1257

Header location automatically converted to ASCII even though location can contain UTF-8 encoded urls

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Invalid
    • 4.2.2
    • None
    • HttpClient (classic)
    • None

    Description

      I'm trying to fetch:

      http://handheld.vn/content.php?4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch

      Which returns:

      2012-10-29 18:54:29,355 DEBUG http.wire: << "HTTP/1.1 303 See Other[\r][\n]" [main]
      2012-10-29 18:54:29,355 DEBUG http.wire: << "Date: Mon, 29 Oct 2012 17:55:57 GMT[\r][\n]" [main]
      2012-10-29 18:54:29,355 DEBUG http.wire: << "Server: Apache[\r][\n]" [main]
      2012-10-29 18:54:29,355 DEBUG http.wire: << "Expires: Thu, 19 Nov 1981 08:52:00 GMT[\r][\n]" [main]
      2012-10-29 18:54:29,356 DEBUG http.wire: << "Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0[\r][\n]" [main]
      2012-10-29 18:54:29,356 DEBUG http.wire: << "Pragma: no-cache[\r][\n]" [main]
      2012-10-29 18:54:29,356 DEBUG http.wire: << "Set-Cookie: bb_lastactivity=0; expires=Tue, 29-Oct-2013 17:55:57 GMT; path=/[\r][\n]" [main]
      2012-10-29 18:54:29,356 DEBUG http.wire: << "Location: http://handheld.vn/content/4052-????nh-gi??-m??y-t??nh-b???ng-Kindle-Fire-HD-7-inch[\r][\n]" [main]
      2012-10-29 18:54:29,357 DEBUG http.wire: << "Content-Length: 0[\r][\n]" [main]
      2012-10-29 18:54:29,357 DEBUG http.wire: << "Connection: close[\r][\n]" [main]
      2012-10-29 18:54:29,357 DEBUG http.wire: << "Content-Type: text/html[\r][\n]" [main]
      2012-10-29 18:54:29,357 DEBUG http.wire: << "[\r][\n]" [main]
      2012-10-29 18:54:29,357 DEBUG conn.DefaultClientConnection: Receiving response: HTTP/1.1 303 See Other [main]
      2012-10-29 18:54:29,357 DEBUG http.headers: << HTTP/1.1 303 See Other [main]
      2012-10-29 18:54:29,358 DEBUG http.headers: << Date: Mon, 29 Oct 2012 17:55:57 GMT [main]
      2012-10-29 18:54:29,358 DEBUG http.headers: << Server: Apache [main]
      2012-10-29 18:54:29,358 DEBUG http.headers: << Expires: Thu, 19 Nov 1981 08:52:00 GMT [main]
      2012-10-29 18:54:29,358 DEBUG http.headers: << Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 [main]
      2012-10-29 18:54:29,358 DEBUG http.headers: << Pragma: no-cache [main]
      2012-10-29 18:54:29,358 DEBUG http.headers: << Set-Cookie: bb_lastactivity=0; expires=Tue, 29-Oct-2013 17:55:57 GMT; path=/ [main]
      2012-10-29 18:54:29,358 DEBUG http.headers: << Location: http://handheld.vn/content/4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch [main]
      2012-10-29 18:54:29,358 DEBUG http.headers: << Content-Length: 0 [main]
      2012-10-29 18:54:29,358 DEBUG http.headers: << Connection: close [main]
      2012-10-29 18:54:29,359 DEBUG http.headers: << Content-Type: text/html [main]

      Unfortunately I can't get the resolve Url through the following code:

      Header locationHeader = response.getFirstHeader("location");
      which will return http://handheld.vn/content/4052-Đánh-giá-máy-tính-bảng-Kindle-Fire-HD-7-inch

      The header has already been extracted in the wrong content encoding. I will never be able to get the redirect url!

      I understand that this is not RFC normalised behavior, but the above url and redirect works fine in all browsers.

      Is it possible to access the raw header (byte array) so that I can chose the encoding on my own? This would help a lot. Or a parameter to optionally specify the encoding when fetching a header value.

      Attachments

        Activity

          People

            Unassigned Unassigned
            bluelu Thibaut
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 1h
                1h
                Remaining:
                Remaining Estimate - 1h
                1h
                Logged:
                Time Spent - Not Specified
                Not Specified