Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-4069

Remove Reader String Cache from Generic Datum Reader

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 1.12.0
    • 1.13.0
    • java

    Description

      I was doing some profiling, and this "ReaderCache" code lit up:

          public Class getStringClass(final Schema s) {
            final IdentitySchemaKey key = new IdentitySchemaKey(s);
            return this.stringClassCache.computeIfAbsent(key, (IdentitySchemaKey k) -> this.findStringClass.apply(k.schema));
          }
        }
      
        private final ReaderCache readerCache = new ReaderCache(this::findStringClass);
      
        protected Class findStringClass(Schema schema) {
          String name = schema.getProp(GenericData.STRING_PROP);
          if (name == null)
            return CharSequence.class;
      
          switch (GenericData.StringType.valueOf(name)) {
          case String:
            return String.class;
          default:
            return CharSequence.class;
          }
        }
       

      The String cache here is simply caching a single value: the class of the STRING_PROP in the Schema. Well, this is a lot over overhead for caching a relatively simple mapping. Consider that this must create a new IdentitySchemaKey object every time it does this lookup and this is a HOT path. It would take less time time, and add less Heap pressure, to perform the simple mapping for each invocation.

      Follow on work: the Map in the Schema is synchronized. Maybe the map can be made non-synchronized or the Schema can explicitly cache this value in a non-synchronized way to make this one property load faster.

      Attachments

        Issue Links

          Activity

            People

              belugabehr David Mollitor
              belugabehr David Mollitor
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h