Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
GenericData.java
protected String getSchemaName(Object datum) { if (datum == null || datum == JsonProperties.NULL_VALUE) return Type.NULL.getName(); if (isRecord(datum)) return getRecordSchema(datum).getFullName(); if (isEnum(datum)) return getEnumSchema(datum).getFullName(); if (isArray(datum)) return Type.ARRAY.getName(); if (isMap(datum)) return Type.MAP.getName(); if (isFixed(datum)) return getFixedSchema(datum).getFullName(); if (isString(datum)) return Type.STRING.getName(); if (isBytes(datum)) return Type.BYTES.getName(); if (isInteger(datum)) return Type.INT.getName(); if (isLong(datum)) return Type.LONG.getName(); if (isFloat(datum)) return Type.FLOAT.getName(); if (isDouble(datum)) return Type.DOUBLE.getName(); if (isBoolean(datum)) return Type.BOOLEAN.getName(); throw new AvroRuntimeException(String.format("Unknown datum type %s: %s", datum.getClass().getName(), datum)); }
This is a lot of effort for each of the simple native types (Long, Float, Double, etc.) type. It is the last thing that is checked. Add a cache for these simple use cases.
I came across this while examining performance of Apache ORC which includes an Avro benchmark for comparison. You can see the charts with the change implemented.
Attachments
Attachments
Issue Links
- links to