Uploaded image for project: 'HttpComponents HttpClient'
  1. HttpComponents HttpClient
  2. HTTPCLIENT-1978

Unicode header values are converted into mojibake

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.5.7, 5.0 Beta3
    • 4.5.9, 5.0 Beta5
    • HttpClient (classic)
    • None

    Description

      Unicode handling is badly broken, as the below examples show:

      httpget.addHeader("X-I-Expect-This-Header", "Федор Достоевский") => X-I-Expect-This-Header: $54>@ >AB>52A:89

      httpget.addHeader("X-I-Expect-This-Header", "宮本茂") => X-I-Expect-This-Header: �,

      httpget.addHeader("X-I-Expect-This-Header", "Ἀριστοτέλης") => X-I-Expect-This-Header:���Ŀĭ���

      The root cause is here:

              for (int i1 = off, i2 = oldlen; i2 < newlen; i1++, i2++) {
                  this.array[i2] = (byte) b[i1];
              }
      

      In this code, b is of type char[] and array is of type byte[]. According to JLS § 5.1.3 ("Narrowing Primitive Conversion"), "[a] narrowing conversion of a char to an integral type T likewise simply discards all but the n lowest order bits, where n is the number of bits used to represent type T."

      There are a few ways we could fix this, and any of them would be better than what we are doing now. The two I'll propose for consideration are:

      1. Just write UTF-8 to the wire; non-ASCII characters should be tolerated as obs-text
      2. Replace non-ASCII characters with an empty string, space, or question mark

      See also: https://issues.apache.org/jira/browse/HTTPCLIENT-1974

      Attachments

        Activity

          People

            Unassigned Unassigned
            rschmitt Ryan Schmitt
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 50m
                50m