Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-12670

JoltTransform processors incorrectly encode/decode text in the Jolt Specification



    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.0.0-M1, 1.24.0, 1.25.0, 2.0.0-M2, 1.26.0, 2.0.0-M3
    • 2.0.0-M4
    • Configuration, Extensions
    • JVM with non-UTF-8 default encoding (e.g. default Windows installation)



      This issue affects environments where the JVM default encoding is not UTF-8. Standard Java installations on Windows are affected, as they usually use the default encoding windows-1252. To reproduce the issue on Linux, change the default encoding to windows-1252 by adding the following line to your bootstrap.conf:



      The Jolt Specification of both the JoltTransformJSON and JoltTransformRecord processors is read interally using the system default encoding, even though it is always stored in UTF-8. This causes non-ASCII characters to be garbled in the Jolt Specification, resulting in incorrect transformations (missing data or garbled keys).

      Steps to reproduce

      1. Make sure NiFi runs with a non-UTF-8 default encoding, see "Environment"
      2. Create a GenerateFlowFile processor with the following content:
        Unknown macro: {   "regularString"}
      3. Connect the processor to a JoltTransformJSON and/or JoltTransformRecord processor.
        (If using the record based processor, use a default JsonTreeReader and JsonRecordSetWriter. The record reader/writer don't affect this bug.)
        Set the Jolt Specification to:

            "operation": "shift",

        Unknown macro: {       "regularString"}


      4. Connect the outputs of the Jolt processor(s) to funnels to be able to observe the result in the queue.
      5. Start the Jolt processor(s) and run the GenerateFlowFile processor once.
        The flow should look similar to this:

        I also attached a JSON export of the example flow.
      6. Observe the content of the resulting FlowFile(s) in the queue.

      Expected Result

      Actual Result

      • Remapped key containing non-ASCII characters is garbled, since the key value originated from the Jolt Specification.
      • The key "keyWithÜmlaut" could not be matched at all, since it contains non-ASCII characters, resulting in missing data in the output.

      Root Cause Analysis

      Both processors use the readTransform method of AbstractJoltTransform to read the Jolt Specification property. This method uses an InputStreamReader without specifying an encoding, which then defaults to the default charset of the environment. Text properties are always encoded in UTF-8. When the default charset is not UTF-8, this results in UTF-8 bytes to be interpreted in a different encoding when converting to a string, resulting in a garbled Jolt Specification being used.


      This issue is not present when any attribute expression language is present in the Jolt Specification. Simply adding ${literal('')} anywhere in the Jolt Specification works around this issue.

      This happens because a different code path is used when expression language is present.
      I don't know why the property is even read line-by-line using a stream reader when no expression language is present. It seems like just using getValue() would work fine even without expression language, and that method doesn't have the encoding bug.


        1. image-2024-01-25-11-01-15-405.png
          101 kB
          René Zeidler
        2. image-2024-01-25-11-59-56-662.png
          5 kB
          René Zeidler
        3. image-2024-01-25-12-00-09-544.png
          6 kB
          René Zeidler
        4. Jolt_Transform_Encoding_Bug_M2.json
          22 kB
          René Zeidler
        5. Jolt_Transform_Encoding_Bug.json
          22 kB
          René Zeidler

        Issue Links



              jrsteinebrey Jim Steinebrey
              Rene_Z René Zeidler
              0 Vote for this issue
              3 Start watching this issue



                Time Tracking

                  Original Estimate - Not Specified
                  Not Specified
                  Remaining Estimate - 0h
                  Time Spent - 0.5h