Avro
  1. Avro
  2. AVRO-946

GenericData.resolveUnion() performance improvement

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.0
    • Fix Version/s: 1.6.1
    • Component/s: java
    • Labels:
      None

      Description

      Due to the sequential nature of today's implementation of GenericData.resolveUnion() (used when serializing an object):

        public int resolveUnion(Schema union, Object datum) {
          int i = 0;
          for (Schema type : union.getTypes()) {
            if (instanceOf(type, datum))
              return i;
            i++;
          }
          throw new UnresolvedUnionException(union, datum);
        }
      

      it showed up when we were doing some serialization performance analysis. A simple optimization can be implemented by keeping a map within the UnionSchema object (in fact, this could actually be a perfect hash map given the potential values in the map are known in advance). The optimization is obviously most notable when a Union within the schema contains many types (in our particular use case, more than 40 in some cases). In this scenario, we observed a 25% improvement by using an identity hash map.

      Even though using an identity map provides a significant boost, we have observed an even further improvement (and removed some of the restrictions of relying on object identity) by using a perfect hash map on the schema names (an extra 15% on top of that in some cases). This implementation, unfortunately, is not something we could contribute at this point, but we thought it'd be a good idea to allow users to provide alternative implementations of the indexing behavior, such as adding the following static method to Schema:

      public static void setUnionTypeIndexCacheFactory(UnionIndexCacheFactory factory)
      {
        unionIndexCacheFactory = factory;
      }
      

      This is what the interface and identity hash map-based implementation would look like:

        /**
         * A factory interface for creating UnionTypeIndexCache instances.
         */
        public static interface UnionIndexCacheFactory
        {
            UnionIndexCache createUnionIndexCache(List<Schema> types);
      
            /**
             * Used for caching schema indices within a union.
             */
            public static interface UnionIndexCache
            {
                void setTypeIndex(Schema schema, int index);
      
                int getTypeIndex(Schema schema);
            }
      
        }
      
        private static class IdentityMapUnionIndexCacheFactory implements UnionIndexCacheFactory
        {
            @Override
            public UnionIndexCache createUnionIndexCache(List<Schema> types)
            {
                return new UnionIndexCache()
                {
                    private final IdentityHashMap<Schema, Integer> schemaToIndex = new IdentityHashMap<Schema, Integer>();
      
                    @Override
                    public void setTypeIndex(Schema schema, int index)
                    {
                        schemaToIndex.put(schema, index);
                    }
      
                    @Override
                    public int getTypeIndex(Schema schema)
                    {
                        Integer index = schemaToIndex.get(schema);
                        return index == null ? -1 : index;
                    }
                };
            }
        }
      

      I will attach a patch later today or early tomorrow.

      Thanks in advance,

      Hernan Otero

      1. AVRO-946.patch
        6 kB
        Doug Cutting
      2. AVRO-946.patch
        5 kB
        Doug Cutting
      3. AVRO-946.patch
        4 kB
        Hernan Otero

        Activity

          People

          • Assignee:
            Doug Cutting
            Reporter:
            Hernan Otero
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development