Uploaded image for project: 'HttpComponents HttpClient'
  1. HttpComponents HttpClient
  2. HTTPCLIENT-1149

EntityUtils.toString should detect Byte order mark (BOM) and remove it if present

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Won't Fix
    • 4.1.2
    • None
    • HttpClient (classic)
    • Windows

    Description

      The Byte order mark at the start of the input stream should be detected and removed by EntityUtils.toString, otherwise strange unwanted characters are left at the start.
      This link lists possible Byte order markings http://en.wikipedia.org/wiki/Byte_order_mark
      I'm not sure if EntityUtils.toString using the BOM to try to detect the encoding, but if it doesn't then it should.

      Example URL that is causing this issue is mircosoft virtual earth WSDL file:
      HttpClient httpclient = new DefaultHttpClient();
      HttpGet httpget = new HttpGet("http://dev.virtualearth.net/webservices/v1/searchservice/searchservice.svc?wsdl");
      HttpResponse response = httpclient.execute(httpget);
      HttpEntity entity = response.getEntity();
      String textContents = EntityUtils.toString(entity);

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            ibeaumont Ian Beaumont
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment