Uploaded image for project: 'HttpComponents HttpClient'
  1. HttpComponents HttpClient
  2. HTTPCLIENT-1978

Unicode header values are converted into mojibake

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.5.7, 5.0 Beta3
    • Fix Version/s: 4.5.9, 5.0 Beta5
    • Component/s: HttpClient (classic)
    • Labels:
      None

      Description

      Unicode handling is badly broken, as the below examples show:

      httpget.addHeader("X-I-Expect-This-Header", "Федор Достоевский") => X-I-Expect-This-Header: $54>@ >AB>52A:89

      httpget.addHeader("X-I-Expect-This-Header", "宮本茂") => X-I-Expect-This-Header: �,

      httpget.addHeader("X-I-Expect-This-Header", "Ἀριστοτέλης") => X-I-Expect-This-Header:���Ŀĭ���

      The root cause is here:

              for (int i1 = off, i2 = oldlen; i2 < newlen; i1++, i2++) {
                  this.array[i2] = (byte) b[i1];
              }
      

      In this code, b is of type char[] and array is of type byte[]. According to JLS § 5.1.3 ("Narrowing Primitive Conversion"), "[a] narrowing conversion of a char to an integral type T likewise simply discards all but the n lowest order bits, where n is the number of bits used to represent type T."

      There are a few ways we could fix this, and any of them would be better than what we are doing now. The two I'll propose for consideration are:

      1. Just write UTF-8 to the wire; non-ASCII characters should be tolerated as obs-text
      2. Replace non-ASCII characters with an empty string, space, or question mark

      See also: https://issues.apache.org/jira/browse/HTTPCLIENT-1974

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rschmitt Ryan Schmitt
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 50m
                50m