[NIFI-6288] character set encoding issue in FetchElasticsearchHttp processor - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.9.2
Fix Version/s: 1.10.0
Component/s: Extensions
Labels:
- easyfix

Description

I used FetchElasticsearchHttp processor to fetch documents in Elasticsearch which have special UTF-8 chars, eg.: characters of foreign languages: accented chars or Japanese/Chinese chars.

It was working as expected on platforms that have UTF-8 as a default file.encoding. But on e.g.: SLES12 VM, the special chars in the document, turned to "?" in the fetched, output flow files.

Taking a look at the source code showed:

AbstractElasticsearchProcessor declares CHARSET property descriptor, but it was not added to

AbstractElasticsearchHttpProcessor in the static initializer block.

and in the place where the content of the document is written to the flowfile, : https://github.com/apache/nifi/blob/65c41ab917d7b5f323aa71d841cc03b29e12d480/nifi-nar-bundles/nifi-elasticsearch-bundle/nifi-elasticsearch-processors/src/main/java/org/apache/nifi/processors/elasticsearch/FetchElasticsearchHttp.java#L237 it uses


out.write(source.toString().getBytes());

which will only work if the JVM's file.encoding is UTF-8.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

after-fix-encoded-message.png
10/May/19 11:49
57 kB
Endre Kovacs
bad-encoded-message.png
10/May/19 11:49
62 kB
Endre Kovacs
message-generator.png
10/May/19 11:49
92 kB
Endre Kovacs
simple-flow-overview.png
10/May/19 11:49
442 kB
Endre Kovacs

Issue Links

links to

GitHub Pull Request #3467

Activity

People

Assignee:: Endre Kovacs

Reporter:: Endre Kovacs

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 10/May/19 11:09

Updated:: 10/May/19 13:33

Resolved:: 10/May/19 13:33

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

20m