Avro
  1. Avro
  2. AVRO-1072

The JSON encoder doesn't handle non-ASCII character properly

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.6.3
    • Fix Version/s: None
    • Component/s: java
    • Labels:
      None
    • Environment:

      Description

      The JSON encoder uses default encoding of the platform. It should always use UTF-8.

      This causes multiple problems for us,

      1. The text is mangled if sending/receiving machine has different encoding.
      2. Some encodings (like Latin-1 or MacRoman) can't handle all characters (like Chinese) and we get ? in the text.
      3. The binary encoder (ByteBuffer) doesn't work due to this problem.

        Activity

        Hide
        Doug Cutting added a comment -

        Avro's JsonEncoder.java specifies the UTF-8 encoding, so I don't see how this is happening.

        Can you please provide a test that fails in your environment? Thanks!

        Show
        Doug Cutting added a comment - Avro's JsonEncoder.java specifies the UTF-8 encoding, so I don't see how this is happening. Can you please provide a test that fails in your environment? Thanks!
        Zhihong Zhang created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Zhihong Zhang
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development