Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-3572

Python encodes default value of bytes field as UTF-8

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 1.11.0
    • None
    • python
    • None
    • Python 3.9.2

    Description

      The Avro spec says

      Default values for bytes and fixed fields are JSON strings, where Unicode code points 0-255 are mapped to unsigned 8-bit byte values 0-255.

      but in the Avro library for Python, _read_default_value calls str.encode to convert the JSON string to bytes, and str.encode in Python 3 uses UTF-8 by default. So, this miscodes bytes 0x80 and higher. For example, the JSON string "\u0080" becomes two bytes b'\xc2\x80' even though it should become one byte b'\x80'.

      Attachments

        Activity

          People

            Unassigned Unassigned
            kniemitalo Kalle Niemitalo
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: