Avro
  1. Avro
  2. AVRO-607

SpecificData.getSchema not thread-safe

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Blocker Blocker
    • Resolution: Unresolved
    • Affects Version/s: 1.3.3, 1.8.1
    • Fix Version/s: 1.8.2
    • Component/s: java
    • Labels:
    • Release Note:
      Method org.apache.avro.specific.SpecificData::getSchema(type) to use Guava's LoadingCache instead of a Map to avoid race conditions.

      Description

      SpecificData.getSchema uses a WeakHashMap to cache schemas, but WeakHashMap is not thread-safe, and the method itself is not synchronized. Seems like this could lead to the data structure getting corrupted.

      1. 0001-AVRO-607-Changed-SpecificData.getSchema-to-use-a-thr.patch
        3 kB
        Andrius Druzinis-Vitkus
      2. AVRO-607.patch
        3 kB
        Andrius Druzinis-Vitkus
      3. AVRO-607.patch
        0.9 kB
        Doug Cutting

        Issue Links

          Activity

          Hide
          Doug Cutting added a comment -

          Here's a simple patch. I've converted it to use ConcurrentHashMap instead. We already have static ConcurrentHashMaps that contain instances of Class, and most instances of Type are Class. So this will prevent classes and types from getting GC'd, but neither need to be GC'd much.

          A better fix might be to update all uses of static ConcurrentHashMaps with 'new com.google.common.collect.
          new MapMaker().weakKeys().makeMap()'.

          http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/MapMaker.html

          Should we add a dependency on Google Collections?

          Show
          Doug Cutting added a comment - Here's a simple patch. I've converted it to use ConcurrentHashMap instead. We already have static ConcurrentHashMaps that contain instances of Class, and most instances of Type are Class. So this will prevent classes and types from getting GC'd, but neither need to be GC'd much. A better fix might be to update all uses of static ConcurrentHashMaps with 'new com.google.common.collect. new MapMaker().weakKeys().makeMap()'. http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/MapMaker.html Should we add a dependency on Google Collections?
          Hide
          Scott Carey added a comment -

          I quite like Guava, Having a concurrent weak hash map is great, and the Immutable collections are very useful, and several other collection types are massive time savers (Multiset, Multimap and BiMap).

          However, items get deprecated and dissapear in 2 years in Guava, so we would have to avoid the newest APIs and quickly move off of deprecated ones to prevent users who also use it from coming into conflict. It is manageable, but it is a dependency that is very likely to be used by our users, and if we are on version 11 while a user is on 13, we could be in a position where neither version works for both of us simultaneously. I also worry about our place as a library far down the stack for some users.

          We could complicate our build to shade in only the classes we use under a different namespace to avoid such problems (this may be useful for other dependencies as well).

          Show
          Scott Carey added a comment - I quite like Guava, Having a concurrent weak hash map is great, and the Immutable collections are very useful, and several other collection types are massive time savers (Multiset, Multimap and BiMap). However, items get deprecated and dissapear in 2 years in Guava, so we would have to avoid the newest APIs and quickly move off of deprecated ones to prevent users who also use it from coming into conflict. It is manageable, but it is a dependency that is very likely to be used by our users, and if we are on version 11 while a user is on 13, we could be in a position where neither version works for both of us simultaneously. I also worry about our place as a library far down the stack for some users. We could complicate our build to shade in only the classes we use under a different namespace to avoid such problems (this may be useful for other dependencies as well).
          Hide
          Scott Carey added a comment -

          Alternative to this patch, we could synchronize the method, or we can use a ThreadLocal<WeakHashMap<Type, Schema>>.

          A cache with a Type or Class key (that is not weak) that becomes static can lead to classloader leaks.

          In Avro code, a weak concurrent hash map is in high demand.

          Show
          Scott Carey added a comment - Alternative to this patch, we could synchronize the method, or we can use a ThreadLocal<WeakHashMap<Type, Schema>>. A cache with a Type or Class key (that is not weak) that becomes static can lead to classloader leaks. In Avro code, a weak concurrent hash map is in high demand.
          Hide
          Doug Cutting added a comment -

          > In Avro code, a weak concurrent hash map is in high demand.

          Might we add a org.apache.avro.util.WeakConcurrentMap<T> class that's a ConncurrentHashMap<WeakReference<T>>? We already have our own WeakIdentityHashMap...

          Show
          Doug Cutting added a comment - > In Avro code, a weak concurrent hash map is in high demand. Might we add a org.apache.avro.util.WeakConcurrentMap<T> class that's a ConncurrentHashMap<WeakReference<T>>? We already have our own WeakIdentityHashMap...
          Hide
          Scott Carey added a comment -

          If I recall, that turns out to be very hard due to how the equals contract works with weak references. There is already a Java WeakHashMap, so making one with identity semantics wasn't too hard.
          We may need thousands of lines of code and might have to implement our own concurrent map implementation. I think I'd rather spend my efforts figuring out how to extract Google's implementation into another namespace in the build with shade, jarjar, or similar.

          Show
          Scott Carey added a comment - If I recall, that turns out to be very hard due to how the equals contract works with weak references. There is already a Java WeakHashMap, so making one with identity semantics wasn't too hard. We may need thousands of lines of code and might have to implement our own concurrent map implementation. I think I'd rather spend my efforts figuring out how to extract Google's implementation into another namespace in the build with shade, jarjar, or similar.
          Hide
          Dave Latham added a comment -

          This is affecting us in production, with the symptom being hundreds of threads being stuck in a loop at:

          	at java.util.WeakHashMap.get(WeakHashMap.java:470)
          	at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:138)
          	at org.apache.avro.specific.SpecificDatumWriter.<init>(SpecificDatumWriter.java:33)
          

          until the process is restarted.

          Show
          Dave Latham added a comment - This is affecting us in production, with the symptom being hundreds of threads being stuck in a loop at: at java.util.WeakHashMap.get(WeakHashMap.java:470) at org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:138) at org.apache.avro.specific.SpecificDatumWriter.<init>(SpecificDatumWriter.java:33) until the process is restarted.
          Hide
          ASF GitHub Bot added a comment -

          GitHub user rdblue opened a pull request:

          https://github.com/apache/avro/pull/30

          AVRO-607. Make SpecificData#getSchema thread-safe.

          This adds a version of WeakIdentityHashMap that is thread-safe by
          replacing the backing Map implementation with ConcurrentHashMap. Both
          WeakIdentityHashmap and the new WeakIdentityConcurrentMap are subclases
          of the (new) WeakIdentityMap, which allows subclasses to supply a
          backing map implementation.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/rdblue/avro AVRO-607-concurrent-schema-cache

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/avro/pull/30.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #30


          commit b287db7560928d88b1ef1dd0cc79d1ad3e7cfa1e
          Author: Ryan Blue <blue@apache.org>
          Date: 2015-03-31T00:31:36Z

          AVRO-607. Make SpecificData#getSchema thread-safe.

          This adds a version of WeakIdentityHashMap that is thread-safe by
          replacing the backing Map implementation with ConcurrentHashMap. Both
          WeakIdentityHashmap and the new WeakIdentityConcurrentMap are subclases
          of the (new) WeakIdentityMap, which allows subclasses to supply a
          backing map implementation.


          Show
          ASF GitHub Bot added a comment - GitHub user rdblue opened a pull request: https://github.com/apache/avro/pull/30 AVRO-607 . Make SpecificData#getSchema thread-safe. This adds a version of WeakIdentityHashMap that is thread-safe by replacing the backing Map implementation with ConcurrentHashMap. Both WeakIdentityHashmap and the new WeakIdentityConcurrentMap are subclases of the (new) WeakIdentityMap, which allows subclasses to supply a backing map implementation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rdblue/avro AVRO-607 -concurrent-schema-cache Alternatively you can review and apply these changes as the patch at: https://github.com/apache/avro/pull/30.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #30 commit b287db7560928d88b1ef1dd0cc79d1ad3e7cfa1e Author: Ryan Blue <blue@apache.org> Date: 2015-03-31T00:31:36Z AVRO-607 . Make SpecificData#getSchema thread-safe. This adds a version of WeakIdentityHashMap that is thread-safe by replacing the backing Map implementation with ConcurrentHashMap. Both WeakIdentityHashmap and the new WeakIdentityConcurrentMap are subclases of the (new) WeakIdentityMap, which allows subclasses to supply a backing map implementation.
          Hide
          Ryan Blue added a comment -

          Dave Latham's comment is a good excuse to start this thread again. From the discussion above, it sounds like building the WeakConcurrentMap is a problem because its equals method doesn't override the default and two WeakReferences to the same object aren't equal. But, WeakIdentityHashMap creates a subclass of WeakReference that overrides equals to avoid that problem and it would be appropriate to use an IdentityHashMap for this cache. So I created a version of the WeakIdentityHashMap that is backed by a ConcurrentHashMap instead of a regular HashMap. Even if the GC removes the weakly-referenced object while the ConcurrentHashMap is doing some operation, it either removes all weak references or none, so key equality is preserved.

          I also caught a couple of places where I think the current implementation violates its contract. For example, keySet gets all of the identity weak references and accumulates a set of the real values. As soon as each real value is referenced, we know that the weak reference won't disappear. But I think there's a small opportunity between reaping the current reference set and adding all the referenced objects to the key set for GC to run and remove values:

            public Set<K> keySet() {
              reap();
              Set<K> ret = new HashSet<K>();
              for (IdentityWeakReference<K> ref : backingStore.keySet()) {
                // half-way through the loop, GC could run and remove the next referenced object
                ret.add(ref.get());
              }
              return Collections.unmodifiableSet(ret);
            }
          

          I've also added checks so that in keySet and entrySet, the WeakReference is resolved (which prevents losing the value) and checked to see if it is null before adding to the output collection.

          I've added PR #30 on github, and I can upload a patch if that's preferred.

          Show
          Ryan Blue added a comment - Dave Latham 's comment is a good excuse to start this thread again. From the discussion above, it sounds like building the WeakConcurrentMap is a problem because its equals method doesn't override the default and two WeakReferences to the same object aren't equal. But, WeakIdentityHashMap creates a subclass of WeakReference that overrides equals to avoid that problem and it would be appropriate to use an IdentityHashMap for this cache. So I created a version of the WeakIdentityHashMap that is backed by a ConcurrentHashMap instead of a regular HashMap. Even if the GC removes the weakly-referenced object while the ConcurrentHashMap is doing some operation, it either removes all weak references or none, so key equality is preserved. I also caught a couple of places where I think the current implementation violates its contract. For example, keySet gets all of the identity weak references and accumulates a set of the real values. As soon as each real value is referenced, we know that the weak reference won't disappear. But I think there's a small opportunity between reaping the current reference set and adding all the referenced objects to the key set for GC to run and remove values: public Set<K> keySet() { reap(); Set<K> ret = new HashSet<K>(); for (IdentityWeakReference<K> ref : backingStore.keySet()) { // half-way through the loop, GC could run and remove the next referenced object ret.add(ref.get()); } return Collections.unmodifiableSet(ret); } I've also added checks so that in keySet and entrySet , the WeakReference is resolved (which prevents losing the value) and checked to see if it is null before adding to the output collection. I've added PR #30 on github, and I can upload a patch if that's preferred.
          Hide
          Ryan Blue added a comment -

          Gwen Shapira or Tom White, do you have time to take a look?

          Show
          Ryan Blue added a comment - Gwen Shapira or Tom White , do you have time to take a look?
          Hide
          Tom White added a comment -

          > I think I'd rather spend my efforts figuring out how to extract Google's implementation into another namespace in the build with shade, jarjar, or similar.

          I agree. Shading Guava is not too hard, and means that we'd be able to use well-tested code.

          Show
          Tom White added a comment - > I think I'd rather spend my efforts figuring out how to extract Google's implementation into another namespace in the build with shade, jarjar, or similar. I agree. Shading Guava is not too hard, and means that we'd be able to use well-tested code.
          Hide
          Ryan Blue added a comment -

          Shading guava isn't too bad, but it will increase the size of the Avro jar by 1.6 MB and make the build more complex. I'm not sure that's worth replacing this class.

          Show
          Ryan Blue added a comment - Shading guava isn't too bad, but it will increase the size of the Avro jar by 1.6 MB and make the build more complex. I'm not sure that's worth replacing this class.
          Hide
          ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/avro/pull/30

          Show
          ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/avro/pull/30
          Hide
          Daniel Halperin added a comment -

          Is this bug still open? It's also affecting some users of Apache Beam (incubating) in production.

          Show
          Daniel Halperin added a comment - Is this bug still open? It's also affecting some users of Apache Beam (incubating) in production.
          Hide
          Ryan Blue added a comment -

          Yes, this is still open. We've added guava, so we just need to convert the cache to the appropriate thread-safe guava one. I'll mark this for the next release.

          Show
          Ryan Blue added a comment - Yes, this is still open. We've added guava, so we just need to convert the cache to the appropriate thread-safe guava one. I'll mark this for the next release.
          Hide
          Daniel Halperin added a comment -

          Thanks Ryan.

          So if I'm reading this right, the only way to avoid this bug is to serialize access to SpecificData.getSchema. Right now, we're hitting this via the following code path:

          --- Thread: Thread[pool-1-thread-15,5,main] State: RUNNABLE stack: ---
            java.util.WeakHashMap.get(WeakHashMap.java:403)
            org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:187)
            org.apache.avro.reflect.ReflectData.isRecord(ReflectData.java:168)
            org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:612)
            org.apache.avro.specific.SpecificData.getSchemaName(SpecificData.java:265)
            org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:601)
            org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151)
            org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)
            org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
            org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
            org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:175)
            org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
            org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
            org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
            org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
            org.apache.beam.sdk.coders.AvroCoder.encode(AvroCoder.java:264)
          

          We have many different threads all calling AvroCoder#encode (on different DatumWriter instances), which eventually reflectively uses the static cache.

          Are there any standard ways to populate the cache to avoid this problem?

          Show
          Daniel Halperin added a comment - Thanks Ryan. So if I'm reading this right, the only way to avoid this bug is to serialize access to SpecificData.getSchema. Right now, we're hitting this via the following code path: --- Thread : Thread [pool-1-thread-15,5,main] State: RUNNABLE stack: --- java.util.WeakHashMap.get(WeakHashMap.java:403) org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:187) org.apache.avro.reflect.ReflectData.isRecord(ReflectData.java:168) org.apache.avro. generic .GenericData.getSchemaName(GenericData.java:612) org.apache.avro.specific.SpecificData.getSchemaName(SpecificData.java:265) org.apache.avro. generic .GenericData.resolveUnion(GenericData.java:601) org.apache.avro. generic .GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151) org.apache.avro. generic .GenericDatumWriter.write(GenericDatumWriter.java:71) org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143) org.apache.avro. generic .GenericDatumWriter.writeField(GenericDatumWriter.java:114) org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:175) org.apache.avro. generic .GenericDatumWriter.writeRecord(GenericDatumWriter.java:104) org.apache.avro. generic .GenericDatumWriter.write(GenericDatumWriter.java:66) org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143) org.apache.avro. generic .GenericDatumWriter.write(GenericDatumWriter.java:58) org.apache.beam.sdk.coders.AvroCoder.encode(AvroCoder.java:264) We have many different threads all calling AvroCoder#encode (on different DatumWriter instances), which eventually reflectively uses the static cache. Are there any standard ways to populate the cache to avoid this problem?
          Hide
          Ryan Blue added a comment -

          Yeah, the getSchema method is public, so you could synchronize calls to it with your classes.

          Show
          Ryan Blue added a comment - Yeah, the getSchema method is public, so you could synchronize calls to it with your classes.
          Hide
          Daniel Halperin added a comment -

          Thanks, Ryan!

          Show
          Daniel Halperin added a comment - Thanks, Ryan!
          Hide
          Daniel Halperin added a comment -

          Thanks, Ryan!

          Show
          Daniel Halperin added a comment - Thanks, Ryan!
          Hide
          Sean Busbey added a comment -

          FYI, I added this to the [help wanted tracker|helpwanted.apache.org] (can't find a permalink to the particular task).

          Show
          Sean Busbey added a comment - FYI, I added this to the [help wanted tracker|helpwanted.apache.org] (can't find a permalink to the particular task).
          Hide
          Chunfy.Tseng added a comment -

          When using RPC, the client can use such as httpclient similar tools or use the postman? Can you please give me an example? Thank you!

          Show
          Chunfy.Tseng added a comment - When using RPC, the client can use such as httpclient similar tools or use the postman? Can you please give me an example? Thank you!
          Hide
          ASF GitHub Bot added a comment -

          GitHub user dhalperi opened a pull request:

          https://github.com/apache/incubator-beam/pull/664

          Upgrade to Avro 1.8.1

          This version has some important fixes for potential memory leaks.

          (It does not have a fix for AVRO-607 yet, but still has good fixes.)

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/dhalperi/incubator-beam avro-upgrade

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/incubator-beam/pull/664.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #664


          commit 313a3e60042c86c0130c499ed5503380dab251cd
          Author: Dan Halperin <dhalperi@google.com>
          Date: 2016-07-15T15:43:35Z

          Upgrade to Avro 1.8.1

          This version has some important fixes for potential memory leaks.

          (It does not have a fix for AVRO-607 yet, but still has good fixes.)


          Show
          ASF GitHub Bot added a comment - GitHub user dhalperi opened a pull request: https://github.com/apache/incubator-beam/pull/664 Upgrade to Avro 1.8.1 This version has some important fixes for potential memory leaks. (It does not have a fix for AVRO-607 yet, but still has good fixes.) You can merge this pull request into a Git repository by running: $ git pull https://github.com/dhalperi/incubator-beam avro-upgrade Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/664.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #664 commit 313a3e60042c86c0130c499ed5503380dab251cd Author: Dan Halperin <dhalperi@google.com> Date: 2016-07-15T15:43:35Z Upgrade to Avro 1.8.1 This version has some important fixes for potential memory leaks. (It does not have a fix for AVRO-607 yet, but still has good fixes.)
          Hide
          Daniel Halperin added a comment -

          Sorry folks, I did not think the bot would comment on this issue unless this was in the first line of the commit.

          Show
          Daniel Halperin added a comment - Sorry folks, I did not think the bot would comment on this issue unless this was in the first line of the commit.
          Hide
          Sean Busbey added a comment -

          No worries. It helps to know the relative demand for outstanding issues.

          Show
          Sean Busbey added a comment - No worries. It helps to know the relative demand for outstanding issues.
          Hide
          Andrius Druzinis-Vitkus added a comment -

          Hi all, I would like to propose a fix based on Guava's Caches. The advantage of a cache over a (synchronised) map is that caches offer atomic "get-if-absent-compute" semantics, which should prevent race conditions. Also, it offers a 'weak keys' option (and weak values, if needed).

          Since I can't see an Attach patch button, I've added the patch to a GitHub gist:
          https://gist.github.com/driu/51238a26f3a980d2b91222f788ff651b

          Show
          Andrius Druzinis-Vitkus added a comment - Hi all, I would like to propose a fix based on Guava's Caches. The advantage of a cache over a (synchronised) map is that caches offer atomic "get-if-absent-compute" semantics, which should prevent race conditions. Also, it offers a 'weak keys' option (and weak values, if needed). Since I can't see an Attach patch button, I've added the patch to a GitHub gist: https://gist.github.com/driu/51238a26f3a980d2b91222f788ff651b
          Hide
          Sean Busbey added a comment -

          Hi Andrius Druzinis-Vitkus! I've assigned the jira to you now, so you ought to be able to attach a patch. Please either do that or create a pull request (and reference this jira there).

          Show
          Sean Busbey added a comment - Hi Andrius Druzinis-Vitkus ! I've assigned the jira to you now, so you ought to be able to attach a patch. Please either do that or create a pull request (and reference this jira there).
          Hide
          Sean Busbey added a comment -

          thanks for the patch. could you regenerate it using the git format-patch command? that way I'll know how you want yourself attributed.

          Show
          Sean Busbey added a comment - thanks for the patch. could you regenerate it using the git format-patch command? that way I'll know how you want yourself attributed.
          Hide
          Sean Busbey added a comment -
          +import com.google.common.cache.CacheBuilder;
          +import com.google.common.cache.CacheLoader;
          +import com.google.common.cache.LoadingCache;
          

          If these are in the shaded module, shouldn't they be relocated from the com.google namespace?

          Show
          Sean Busbey added a comment - + import com.google.common.cache.CacheBuilder; + import com.google.common.cache.CacheLoader; + import com.google.common.cache.LoadingCache; If these are in the shaded module, shouldn't they be relocated from the com.google namespace?
          Hide
          Andrius Druzinis-Vitkus added a comment - - edited

          Hi Sean, if I understand correctly, the avro module depends on avro-guava-dependencies (the shaded slimmed-down jar) but not com.google.guava; in order to make a class available in avro, it needs to be referenced in GuavaClasses. At least, that is how I read this comment:

          GuavaClasses.java
            /*
             * Referencing Guava classes here includes them in the minimized Guava jar
             * that is shaded in the avro jar.
             */
          

          I wasn't sure which namespace to use in SpecificData.java and chose the com.google one because classes GenericData and ReflectData also import from com.google and not org.apache.avro:

          GenericData.java
          import com.google.common.collect.MapMaker;
          
          ReflectData.java
          import com.google.common.collect.MapMaker;
          

          Is the above reasoning not correct? I will update the patch once we clear this up.

          Show
          Andrius Druzinis-Vitkus added a comment - - edited Hi Sean, if I understand correctly, the avro module depends on avro-guava-dependencies (the shaded slimmed-down jar) but not com.google.guava; in order to make a class available in avro, it needs to be referenced in GuavaClasses. At least, that is how I read this comment: GuavaClasses.java /* * Referencing Guava classes here includes them in the minimized Guava jar * that is shaded in the avro jar. */ I wasn't sure which namespace to use in SpecificData.java and chose the com.google one because classes GenericData and ReflectData also import from com.google and not org.apache.avro: GenericData.java import com.google.common.collect.MapMaker; ReflectData.java import com.google.common.collect.MapMaker; Is the above reasoning not correct? I will update the patch once we clear this up.
          Hide
          Sean Busbey added a comment -

          hurm. let me try to figure it out today. I thought our goal was to avoid exposing our dependency on guava, but maybe we're rewriting the referenced classes in the build.

          Show
          Sean Busbey added a comment - hurm. let me try to figure it out today. I thought our goal was to avoid exposing our dependency on guava, but maybe we're rewriting the referenced classes in the build.
          Hide
          Andrius Druzinis-Vitkus added a comment -

          Hi Sean Busbey, any updates on this?

          Show
          Andrius Druzinis-Vitkus added a comment - Hi Sean Busbey , any updates on this?
          Hide
          Ryan Blue added a comment -

          Sorry, I didn't see this until now. The org.apache.avro:avro artifact shades the Guava artifact and relocates its classes. That way, Avro depends on Guava classes like normal and they aren't leaked after that. So the normal import lines are correct.

          Show
          Ryan Blue added a comment - Sorry, I didn't see this until now. The org.apache.avro:avro artifact shades the Guava artifact and relocates its classes. That way, Avro depends on Guava classes like normal and they aren't leaked after that. So the normal import lines are correct.
          Hide
          Ryan Blue added a comment -

          Andrius Druzinis-Vitkus, do you know how much larger this makes the avro-guava jar? This is shaded, so we want to keep it small if possible.

          Show
          Ryan Blue added a comment - Andrius Druzinis-Vitkus , do you know how much larger this makes the avro-guava jar? This is shaded, so we want to keep it small if possible.
          Hide
          Andrius Druzinis-Vitkus added a comment - - edited

          The size went up from 852 KB to 1 MB. Is this an acceptable increase?

          These added classes would potentially help elsewhere in the codebase where atomic "get-if-absent-compute" works better, e.g. defaultValueCache in GenericData.java.

          Show
          Andrius Druzinis-Vitkus added a comment - - edited The size went up from 852 KB to 1 MB. Is this an acceptable increase? These added classes would potentially help elsewhere in the codebase where atomic "get-if-absent-compute" works better, e.g. defaultValueCache in GenericData.java.

            People

            • Assignee:
              Andrius Druzinis-Vitkus
              Reporter:
              Stephen Tu
            • Votes:
              2 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:

                Development