Details
Description
There's a performance penalty when reading flat [no nested fields] Avro tables. When reading the same flat dataset in Pig, it takes half the time. On profiling, a lot of time is spent in AvroDeserializer.deserializeSingleItemNullableUnion(). The bulk of the time is spent in GenericData.get().resolveUnion(), which calls GenericData.getSchemaName(Object datum), which does a lot of instanceof checks. This could be simplified with performance benefits. A approach is described in this patch which almost halves the runtime.
Attachments
Attachments
Issue Links
- is related to
-
HIVE-17394 AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row
- Closed