Details
Description
Hello, I have a problem with encoding using Apache CXF.
I send a request to an external SOAP Service and then I get a response without charset in HTTP header Content-Type. The service doesn't send it.
Apache CXF decides it's ISO-8859-1 encoded. But actually, the content is encoded in UTF-8 and has latinic and cyrillic characters.
As a result, I get non-valid values.
There is an example of a response with invalid encoding.
Http headers:
HTTP/1.1 200 OK
Server: nginx
Date: Thu, 15 Jun 2017 05:01:50 GMT
Content-Type: text/xml
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
Content-Encoding: gzip
Test SOAP response:
(Invalid values in ns12:PaymentDocumentID, ns13:region, ns13:city, ns13:address_string and so on)
<?xml version="1.0" encoding="UTF-8" ?> <ns2:Envelope xmlns:ns2="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns10="http://dom.gosuslugi.ru/schema/integration/organizations-base/" xmlns:ns11="http://dom.gosuslugi.ru/schema/integration/payments-base/" xmlns:ns12="http://dom.gosuslugi.ru/schema/integration/bills-base/" xmlns:ns13="http://dom.gosuslugi.ru/schema/integration/payment/" xmlns:ns3="http://www.w3.org/2000/09/xmldsig#" xmlns:ns4="http://dom.gosuslugi.ru/schema/integration/base/" xmlns:ns5="http://dom.gosuslugi.ru/schema/integration/account-base/" xmlns:ns6="http://dom.gosuslugi.ru/schema/integration/nsi-base/" xmlns:ns7="http://dom.gosuslugi.ru/schema/integration/individual-registry-base/" xmlns:ns8="http://dom.gosuslugi.ru/schema/integration/metering-device-base/" xmlns:ns9="http://dom.gosuslugi.ru/schema/integration/organizations-registry-base/"><ns2:Header> <ns4:ResultHeader> <ns4:Date>2017-06-15T08:05:56.336+03:00</ns4:Date> <ns4:MessageGUID>a29d26c2-f2d1-48ea-be11-a47bd175b40a</ns4:MessageGUID> </ns4:ResultHeader> </ns2:Header><ns2:Body> <ns13:getStateResult ns4:version="10.0.1.1"> <ns4:RequestState>3</ns4:RequestState> <ns4:MessageGUID>4fcb1240-5188-11e7-a67f-005056b6513d</ns4:MessageGUID> <ns13:exportPaymentDocumentDetailsResult> <ns13:Charge> <ns13:PaymentDocument> <ns12:PaymentDocumentID>40АА164719-01-7051</ns12:PaymentDocumentID> <ns12:PaymentDocumentNumber>10</ns12:PaymentDocumentNumber> <ns5:UnifiedAccountNumber>40АА164719</ns5:UnifiedAccountNumber> <ns5:AccountNumber>40АА164719</ns5:AccountNumber> <ns5:ServiceID>40АА164719-01</ns5:ServiceID> <ns13:PaymentDocumentDetails> <ns13:ConsumerInformation> <ns13:address> <ns13:region>Ярославская</ns13:region> <ns13:city>Ярославль</ns13:city> <ns13:housenum>34а</ns13:housenum> <ns13:FIASHouseGuid>3d0978ee-6d63-468a-9167-dac0bf36a1bc</ns13:FIASHouseGuid> <ns13:apartment>12</ns13:apartment> <ns13:address_string>150029, обл. Ярославская, г. Ярославль, д. 34а</ns13:address_string> </ns13:address> </ns13:ConsumerInformation> <ns13:ExecutorInformation> <ns10:INN>3808008510</ns10:INN> <ns13:Legal> <ns10:KPP>380801001</ns10:KPP> <ns13:Name>Тестовая организация1</ns13:Name> </ns13:Legal> <ns13:PaymentInformation> <ns11:RecipientINN>3808008510</ns11:RecipientINN> <ns11:RecipientKPP>380801001</ns11:RecipientKPP> <ns11:BankName>ПАО СБЕРБАНК</ns11:BankName> <ns11:PaymentRecipient>Тестовая организация1</ns11:PaymentRecipient> <ns11:BankBIK>044525225</ns11:BankBIK> <ns11:operatingAccountNumber>40703810000020105994</ns11:operatingAccountNumber> <ns11:CorrespondentBankAccount>30101810400000000225</ns11:CorrespondentBankAccount> </ns13:PaymentInformation> <ns13:MailingAddress>komarov-ev@yandex.ru</ns13:MailingAddress> </ns13:ExecutorInformation> <ns13:Reminder>1155000.00</ns13:Reminder> <ns13:Purpose>40АА164719-01-7051</ns13:Purpose> </ns13:PaymentDocumentDetails> <ns4:Year>2017</ns4:Year> <ns4:Month>5</ns4:Month> </ns13:PaymentDocument> </ns13:Charge> </ns13:exportPaymentDocumentDetailsResult> </ns13:getStateResult> </ns2:Body> </ns2:Envelope>
1) Why does apache CXF ignore <?xml version="1.0" encoding="UTF-8" ?> and why UTF-8 is not default encoding?
2) How can I process a response as UTF-8 encoded even without charset=utf-8 in Content-Type header?
I use Apache CXF together with Wildfly 10.1.0.FINAL, but if I use only Apache CXF - the same problem happens.
—
Also I looked at the implementation.
Inside HTTPConduit I found the following code (handleResponseInternal method):
String charset = HttpHeaderHelper.findCharset((String)inMessage.get(Message.CONTENT_TYPE)); String normalizedEncoding = HttpHeaderHelper.mapCharset(charset);
If no charset in ContentType Header (in Response) than normalizedEncoding is ISO-8859-1.
If I set the value UTF-8 in the debug mode It works fine and I get valid result with cyrillic characters instead of лавÑ...