[AVRO-3184] Cache Datum Type Strings in Resolve Union - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.11.0
Component/s: None
Labels:
- pull-request-available

Description

GenericData.java

  protected String getSchemaName(Object datum) {
    if (datum == null || datum == JsonProperties.NULL_VALUE)
      return Type.NULL.getName();
    if (isRecord(datum))
      return getRecordSchema(datum).getFullName();
    if (isEnum(datum))
      return getEnumSchema(datum).getFullName();
    if (isArray(datum))
      return Type.ARRAY.getName();
    if (isMap(datum))
      return Type.MAP.getName();
    if (isFixed(datum))
      return getFixedSchema(datum).getFullName();
    if (isString(datum))
      return Type.STRING.getName();
    if (isBytes(datum))
      return Type.BYTES.getName();
    if (isInteger(datum))
      return Type.INT.getName();
    if (isLong(datum))
      return Type.LONG.getName();
    if (isFloat(datum))
      return Type.FLOAT.getName();
    if (isDouble(datum))
      return Type.DOUBLE.getName();
    if (isBoolean(datum))
      return Type.BOOLEAN.getName();
    throw new AvroRuntimeException(String.format("Unknown datum type %s: %s", datum.getClass().getName(), datum));
  }

This is a lot of effort for each of the simple native types (Long, Float, Double, etc.) type. It is the last thing that is checked. Add a cache for these simple use cases.

I came across this while examining performance of Apache ORC which includes an Avro benchmark for comparison. You can see the charts with the change implemented.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

AVRO-3184.JPG
04/Aug/21 21:07
113 kB
David Mollitor
AVRO-master.JPG
04/Aug/21 21:07
117 kB
David Mollitor

Issue Links

links to

GitHub Pull Request #1301

Activity

People

Assignee:: David Mollitor

Reporter:: David Mollitor

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 04/Aug/21 21:06

Updated:: 16/Sep/21 01:21

Resolved:: 15/Sep/21 08:15

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

2.5h