Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-3834

[Python] Incorrect decimal encoding/decoding

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.11.2
    • None
    • logical types, python
    • None
    • Python 3.10.3, Avro 1.11.2

       

    Description

      When encoding `decimal.Decimal` values using the python avro library, the exponent of the value is largely ignored.

      This means that incorrect twos-complement values are calculated, and we end up with incorrect avros are produced.

      Here's a reasonably compact reproducer:

      import avro
      import avro.io
      from decimal import Decimal
      from io import BytesIO
      
      TESTS = [
          '314',
          '31',
          '3',
          '3.1',
          '31.4',
          '3.14',
          '3.141',
          '3.1415',
      ]
      
      if __name__ == '__main__':
          schema_text = '''{
        "type": "bytes",
        "logicalType": "decimal",
        "precision": 8,
        "scale": 4
          }'''
          print(f"AVRO VERSION: {avro.__version__}")
          schema = avro.schema.parse(schema_text)
          writer = avro.io.DatumWriter(schema)
          reader = avro.io.DatumReader(schema)
      
          for val in TESTS:
              buf = BytesIO()
      
              val = Decimal(val)
              writer.write(val, avro.io.BinaryEncoder(buf))
              buf.seek(0)
              decoded_val = reader.read(avro.io.BinaryDecoder(buf))
              
              match = val == decoded_val
              result = 'PASS' if match else 'FAIL'
              print(f'Encoded: {val} -> {buf.getvalue()} -> {decoded_val}   {result}')
       

      Which outputs:

      AVRO VERSION: 1.11.2
      Encoded: 314 -> b'\x04\x01:' -> 0.0314   FAIL
      Encoded: 31 -> b'\x02\x1f' -> 0.0031   FAIL
      Encoded: 3 -> b'\x02\x03' -> 0.0003   FAIL
      Encoded: 3.1 -> b'\x02\x1f' -> 0.0031   FAIL
      Encoded: 31.4 -> b'\x04\x01:' -> 0.0314   FAIL
      Encoded: 3.14 -> b'\x04\x01:' -> 0.0314   FAIL
      Encoded: 3.141 -> b'\x04\x0cE' -> 0.3141   FAIL
      Encoded: 3.1415 -> b'\x04z\xb7' -> 3.1415   PASS

      The problem is that the code here:
      https://github.com/apache/avro/blob/5bd2bc7a492a611382cddc5db3b5bf0b1b7d2b83/lang/py/avro/io.py#L468
      does not use `exp` to shift the digits, exp is just checked to ensure it's not greater than scale for validation purposes.

      If you look in the output, the produced avro bytes for '31.4' and '3.14' is identical, because the exp is ignored.

      Attachments

        Activity

          People

            Unassigned Unassigned
            stestagg Steve Stagg
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: