Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-6288

character set encoding issue in FetchElasticsearchHttp processor

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.9.2
    • 1.10.0
    • Extensions

    Description

      I used  FetchElasticsearchHttp processor to fetch documents in Elasticsearch which have special UTF-8 chars, eg.: characters of foreign languages: accented chars or Japanese/Chinese chars.

      It was working as expected on platforms that have UTF-8 as a default file.encoding. But on e.g.:  SLES12 VM, the special chars in the document, turned to "?" in the fetched, output flow files.

       

      Taking a look at the source code showed:

      • AbstractElasticsearchProcessor declares CHARSET property descriptor, but it was not added to 

      AbstractElasticsearchHttpProcessor in the static initializer block.

      
      out.write(source.toString().getBytes());
      
      

       

      which will only work if the JVM's file.encoding is UTF-8.

       

       

       

      Attachments

        1. after-fix-encoded-message.png
          57 kB
          Endre Kovacs
        2. bad-encoded-message.png
          62 kB
          Endre Kovacs
        3. message-generator.png
          92 kB
          Endre Kovacs
        4. simple-flow-overview.png
          442 kB
          Endre Kovacs

        Issue Links

          Activity

            People

              andrewsmith87 Endre Kovacs
              andrewsmith87 Endre Kovacs
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m