[HTTPCLIENT-2029] URIBuilder cannot parse non-UTF8 URIs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 4.5.10
Fix Version/s: 4.5.11, 5.0 Beta7
Component/s: None
Labels:
None

Description

URIBuilder always parses a given URI using UTF-8. For example given the following URI that still uses latin1:

http://host/?x=%E4

%E4 is an enoded "ä" character in latin1.

new URIBuilder("http://host/?x=%E4").setCharset(ISO_8859_1).getQueryParams().get(0).getValue() outputs "�"

This is because the URIBuilder constructor already parses the given URI and the charset is at this time always null, thus UTF-8 is used.

Proposed fix:
Provide overloaded constructors that also allow to specify the charset; for example:

    public URIBuilder(final String string, final Charset charset) throws URISyntaxException {
        this.charset = charset;
        digestURI(new URI(string));
    }

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Matthias Keller

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 12/Nov/19 14:46

Updated:: 16/Nov/19 13:11

Resolved:: 16/Nov/19 13:11