Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.11.2
-
None
-
None
-
Python 3.10.3, Avro 1.11.2
Description
When encoding `decimal.Decimal` values using the python avro library, the exponent of the value is largely ignored.
This means that incorrect twos-complement values are calculated, and we end up with incorrect avros are produced.
Here's a reasonably compact reproducer:
import avro import avro.io from decimal import Decimal from io import BytesIO TESTS = [ '314', '31', '3', '3.1', '31.4', '3.14', '3.141', '3.1415', ] if __name__ == '__main__': schema_text = '''{ "type": "bytes", "logicalType": "decimal", "precision": 8, "scale": 4 }''' print(f"AVRO VERSION: {avro.__version__}") schema = avro.schema.parse(schema_text) writer = avro.io.DatumWriter(schema) reader = avro.io.DatumReader(schema) for val in TESTS: buf = BytesIO() val = Decimal(val) writer.write(val, avro.io.BinaryEncoder(buf)) buf.seek(0) decoded_val = reader.read(avro.io.BinaryDecoder(buf)) match = val == decoded_val result = 'PASS' if match else 'FAIL' print(f'Encoded: {val} -> {buf.getvalue()} -> {decoded_val} {result}')
Which outputs:
AVRO VERSION: 1.11.2 Encoded: 314 -> b'\x04\x01:' -> 0.0314 FAIL Encoded: 31 -> b'\x02\x1f' -> 0.0031 FAIL Encoded: 3 -> b'\x02\x03' -> 0.0003 FAIL Encoded: 3.1 -> b'\x02\x1f' -> 0.0031 FAIL Encoded: 31.4 -> b'\x04\x01:' -> 0.0314 FAIL Encoded: 3.14 -> b'\x04\x01:' -> 0.0314 FAIL Encoded: 3.141 -> b'\x04\x0cE' -> 0.3141 FAIL Encoded: 3.1415 -> b'\x04z\xb7' -> 3.1415 PASS
The problem is that the code here:
https://github.com/apache/avro/blob/5bd2bc7a492a611382cddc5db3b5bf0b1b7d2b83/lang/py/avro/io.py#L468
does not use `exp` to shift the digits, exp is just checked to ensure it's not greater than scale for validation purposes.
If you look in the output, the produced avro bytes for '31.4' and '3.14' is identical, because the exp is ignored.