Due to the sequential nature of today's implementation of GenericData.resolveUnion() (used when serializing an object):
it showed up when we were doing some serialization performance analysis. A simple optimization can be implemented by keeping a map within the UnionSchema object (in fact, this could actually be a perfect hash map given the potential values in the map are known in advance). The optimization is obviously most notable when a Union within the schema contains many types (in our particular use case, more than 40 in some cases). In this scenario, we observed a 25% improvement by using an identity hash map.
Even though using an identity map provides a significant boost, we have observed an even further improvement (and removed some of the restrictions of relying on object identity) by using a perfect hash map on the schema names (an extra 15% on top of that in some cases). This implementation, unfortunately, is not something we could contribute at this point, but we thought it'd be a good idea to allow users to provide alternative implementations of the indexing behavior, such as adding the following static method to Schema:
This is what the interface and identity hash map-based implementation would look like:
I will attach a patch later today or early tomorrow.
Thanks in advance,
|Status||Open [ 1 ]||Patch Available [ 10002 ]|
|Status||Patch Available [ 10002 ]||Resolved [ 5 ]|
|Assignee||Doug Cutting [ cutting ]|
|Fix Version/s||1.6.1 [ 12318847 ]|
|Resolution||Fixed [ 1 ]|
|Status||Resolved [ 5 ]||Closed [ 6 ]|
|Transition||Time In Source Status||Execution Times||Last Executer||Last Execution Date|
|3d 22h 20m||1||Hernan Otero||31/Oct/11 19:49|
|4d 1h 9m||1||Doug Cutting||04/Nov/11 20:59|
|10d 20h 31m||1||Doug Cutting||15/Nov/11 17:30|