Avro
  1. Avro
  2. AVRO-607

SpecificData.getSchema not thread-safe

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.3.3
    • Fix Version/s: None
    • Component/s: java
    • Labels:
      None

      Description

      SpecificData.getSchema uses a WeakHashMap to cache schemas, but WeakHashMap is not thread-safe, and the method itself is not synchronized. Seems like this could lead to the data structure getting corrupted.

      1. AVRO-607.patch
        0.9 kB
        Doug Cutting

        Activity

        Hide
        Doug Cutting added a comment -

        Here's a simple patch. I've converted it to use ConcurrentHashMap instead. We already have static ConcurrentHashMaps that contain instances of Class, and most instances of Type are Class. So this will prevent classes and types from getting GC'd, but neither need to be GC'd much.

        A better fix might be to update all uses of static ConcurrentHashMaps with 'new com.google.common.collect.
        new MapMaker().weakKeys().makeMap()'.

        http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/MapMaker.html

        Should we add a dependency on Google Collections?

        Show
        Doug Cutting added a comment - Here's a simple patch. I've converted it to use ConcurrentHashMap instead. We already have static ConcurrentHashMaps that contain instances of Class, and most instances of Type are Class. So this will prevent classes and types from getting GC'd, but neither need to be GC'd much. A better fix might be to update all uses of static ConcurrentHashMaps with 'new com.google.common.collect. new MapMaker().weakKeys().makeMap()'. http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/MapMaker.html Should we add a dependency on Google Collections?
        Hide
        Scott Carey added a comment -

        I quite like Guava, Having a concurrent weak hash map is great, and the Immutable collections are very useful, and several other collection types are massive time savers (Multiset, Multimap and BiMap).

        However, items get deprecated and dissapear in 2 years in Guava, so we would have to avoid the newest APIs and quickly move off of deprecated ones to prevent users who also use it from coming into conflict. It is manageable, but it is a dependency that is very likely to be used by our users, and if we are on version 11 while a user is on 13, we could be in a position where neither version works for both of us simultaneously. I also worry about our place as a library far down the stack for some users.

        We could complicate our build to shade in only the classes we use under a different namespace to avoid such problems (this may be useful for other dependencies as well).

        Show
        Scott Carey added a comment - I quite like Guava, Having a concurrent weak hash map is great, and the Immutable collections are very useful, and several other collection types are massive time savers (Multiset, Multimap and BiMap). However, items get deprecated and dissapear in 2 years in Guava, so we would have to avoid the newest APIs and quickly move off of deprecated ones to prevent users who also use it from coming into conflict. It is manageable, but it is a dependency that is very likely to be used by our users, and if we are on version 11 while a user is on 13, we could be in a position where neither version works for both of us simultaneously. I also worry about our place as a library far down the stack for some users. We could complicate our build to shade in only the classes we use under a different namespace to avoid such problems (this may be useful for other dependencies as well).
        Hide
        Scott Carey added a comment -

        Alternative to this patch, we could synchronize the method, or we can use a ThreadLocal<WeakHashMap<Type, Schema>>.

        A cache with a Type or Class key (that is not weak) that becomes static can lead to classloader leaks.

        In Avro code, a weak concurrent hash map is in high demand.

        Show
        Scott Carey added a comment - Alternative to this patch, we could synchronize the method, or we can use a ThreadLocal<WeakHashMap<Type, Schema>>. A cache with a Type or Class key (that is not weak) that becomes static can lead to classloader leaks. In Avro code, a weak concurrent hash map is in high demand.
        Hide
        Doug Cutting added a comment -

        > In Avro code, a weak concurrent hash map is in high demand.

        Might we add a org.apache.avro.util.WeakConcurrentMap<T> class that's a ConncurrentHashMap<WeakReference<T>>? We already have our own WeakIdentityHashMap...

        Show
        Doug Cutting added a comment - > In Avro code, a weak concurrent hash map is in high demand. Might we add a org.apache.avro.util.WeakConcurrentMap<T> class that's a ConncurrentHashMap<WeakReference<T>>? We already have our own WeakIdentityHashMap...
        Hide
        Scott Carey added a comment -

        If I recall, that turns out to be very hard due to how the equals contract works with weak references. There is already a Java WeakHashMap, so making one with identity semantics wasn't too hard.
        We may need thousands of lines of code and might have to implement our own concurrent map implementation. I think I'd rather spend my efforts figuring out how to extract Google's implementation into another namespace in the build with shade, jarjar, or similar.

        Show
        Scott Carey added a comment - If I recall, that turns out to be very hard due to how the equals contract works with weak references. There is already a Java WeakHashMap, so making one with identity semantics wasn't too hard. We may need thousands of lines of code and might have to implement our own concurrent map implementation. I think I'd rather spend my efforts figuring out how to extract Google's implementation into another namespace in the build with shade, jarjar, or similar.

          People

          • Assignee:
            Unassigned
            Reporter:
            Stephen Tu
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development