Uploaded image for project: 'Avro'
  1. Avro
  2. AVRO-2051

Thread contention accessing JsonProperties props

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.8.2
    • Fix Version/s: None
    • Component/s: java
    • Labels:
      None
    • Flags:
      Patch

      Description

      See https://lists.apache.org/thread.html/dd34ab8439137a81a9de29ad4161f37b17638394cea0806765689976@%3Cuser.avro.apache.org%3E

      Basically, the getJsonProp method, being synchronized, is causing thread contention issues when trying to share schemas between threads. My proposal (pull request forthcoming) is to treat "props" as an immutable map and do a copy+add+swap for the addProp method. This will make the addProp call slower (particularly for large maps of props), but would make the reads significantly faster as no locking will be needed.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user dkulp opened a pull request:

          https://github.com/apache/avro/pull/236

          AVRO-2051 Remove synchronization for JsonProperties.getJsonProp

          This change does two basic things:

          1) Makes "props" a private field and requires the subclasses to access it via the additional methods. This allows some changing of the underlying implementation a bit easier.

          2) Change props to an AtomicReference and makes it act like an immutable map. The addProp method does a full copy of the map, adds the new value, and then atomicly swaps in the map thus not affecting other threads that would be using the value that was "current" when they called the get method.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/dkulp/avro master

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/avro/pull/236.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #236


          commit ad14635fa3af97b90282a79b7e04a0b8753e45b5
          Author: Daniel Kulp <dkulp@apache.org>
          Date: 2017-07-17T19:08:10Z

          AVRO-2051 Remove synchronization for JsonProperties.getJsonProp


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user dkulp opened a pull request: https://github.com/apache/avro/pull/236 AVRO-2051 Remove synchronization for JsonProperties.getJsonProp This change does two basic things: 1) Makes "props" a private field and requires the subclasses to access it via the additional methods. This allows some changing of the underlying implementation a bit easier. 2) Change props to an AtomicReference and makes it act like an immutable map. The addProp method does a full copy of the map, adds the new value, and then atomicly swaps in the map thus not affecting other threads that would be using the value that was "current" when they called the get method. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dkulp/avro master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/avro/pull/236.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #236 commit ad14635fa3af97b90282a79b7e04a0b8753e45b5 Author: Daniel Kulp <dkulp@apache.org> Date: 2017-07-17T19:08:10Z AVRO-2051 Remove synchronization for JsonProperties.getJsonProp
          Hide
          cutting Doug Cutting added a comment -

          This can make building schemas quadratic in the number of properties, no? While for most schemas this is probably not an issue, for some it might significantly impact performance.

          I think instead we should just bite the bullet and make Schema immutable, eliminating the addProp method altogether. At the same time, we should stop exposing JsonNode in the public API, instead using only Object, as intended in AVRO-1585.

          Show
          cutting Doug Cutting added a comment - This can make building schemas quadratic in the number of properties, no? While for most schemas this is probably not an issue, for some it might significantly impact performance. I think instead we should just bite the bullet and make Schema immutable, eliminating the addProp method altogether. At the same time, we should stop exposing JsonNode in the public API, instead using only Object, as intended in AVRO-1585 .
          Hide
          dkulp Daniel Kulp added a comment -

          I'm trying to find something that will work for Avro 1.8.x as that's what we'll need. Thus, removing all of that is likely not an option.

          That said, I just discovered that we already have parts of guava shaded in as a dependency. Thus, I believe I can use the CacheBuilder to create the equivalent of a "ConcurrentLinkedHashMap" (there are some google links that mention this) that would work for this and not have the quadratic issue. I'll investigate more tomorrow. Another option would be to either add a dependency to something else (like caffeine) that has a ConcurrentLinkedHashMap or copy/shade an Apache licensed version (like https://github.com/ben-manes/concurrentlinkedhashmap/blob/master/src/main/java/com/googlecode/concurrentlinkedhashmap/ConcurrentLinkedHashMap.java) into the src and use it.

          Show
          dkulp Daniel Kulp added a comment - I'm trying to find something that will work for Avro 1.8.x as that's what we'll need. Thus, removing all of that is likely not an option. That said, I just discovered that we already have parts of guava shaded in as a dependency. Thus, I believe I can use the CacheBuilder to create the equivalent of a "ConcurrentLinkedHashMap" (there are some google links that mention this) that would work for this and not have the quadratic issue. I'll investigate more tomorrow. Another option would be to either add a dependency to something else (like caffeine) that has a ConcurrentLinkedHashMap or copy/shade an Apache licensed version (like https://github.com/ben-manes/concurrentlinkedhashmap/blob/master/src/main/java/com/googlecode/concurrentlinkedhashmap/ConcurrentLinkedHashMap.java ) into the src and use it.
          Hide
          dkulp Daniel Kulp added a comment - - edited

          Of course another option is to just surround the access to the props with a ReentrantReadWriteLock..... Bunch of ideas to test and benchmark.

          Show
          dkulp Daniel Kulp added a comment - - edited Of course another option is to just surround the access to the props with a ReentrantReadWriteLock..... Bunch of ideas to test and benchmark.
          Hide
          dkulp Daniel Kulp added a comment -


          I just updated the pull request. Basically, since the JsonProperties props field is only ever iterated across and added to (values never change or are removed), it was simple enough to create a ConcurrentHashMap subclass that can also maintain a linked list type thing to maintain the order. This keeps the O(log N) lookups, O(log N) inserts, and O(N) iterator.

          Show
          dkulp Daniel Kulp added a comment - I just updated the pull request. Basically, since the JsonProperties props field is only ever iterated across and added to (values never change or are removed), it was simple enough to create a ConcurrentHashMap subclass that can also maintain a linked list type thing to maintain the order. This keeps the O(log N) lookups, O(log N) inserts, and O(N) iterator.
          Hide
          cutting Doug Cutting added a comment -

          Should we perhaps instead put the ConcurrentLinkedHashMap into the org.apache.avro.util package, rather than implementing this as a nested utility class?

          Also, I'm not sure why you added methods for the hashCode, equals and hasProps, but I question whether these should be protected (where they appear in javadoc for subclasses to override) or rather package private (so that they can only be used by other classes in the package). I generally think we should avoid adding publicly visible methods that we don't expect things outside the package to ever need to call. The Schema classes are not meant to be extensible. Perhaps this should at least be a separate issue, since it's not related to the thread contention issue?

          Lastly, it would be good to add a test that verifys the thread contention has been addressed. This might be as simple as a synchronized(schema) thread that blocks another thread that tries to change properties without the patch and does not with the patch.

          Show
          cutting Doug Cutting added a comment - Should we perhaps instead put the ConcurrentLinkedHashMap into the org.apache.avro.util package, rather than implementing this as a nested utility class? Also, I'm not sure why you added methods for the hashCode, equals and hasProps, but I question whether these should be protected (where they appear in javadoc for subclasses to override) or rather package private (so that they can only be used by other classes in the package). I generally think we should avoid adding publicly visible methods that we don't expect things outside the package to ever need to call. The Schema classes are not meant to be extensible. Perhaps this should at least be a separate issue, since it's not related to the thread contention issue? Lastly, it would be good to add a test that verifys the thread contention has been addressed. This might be as simple as a synchronized(schema) thread that blocks another thread that tries to change properties without the patch and does not with the patch.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 2m 6s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 buildtest 0m 0s master passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          -1 buildtest 0m 23s java in the patch failed.
          2m 34s



          Subsystem Report/Notes
          Docker Client=1.13.1 Server=1.13.1 Image:yetus/avro:793178a
          JIRA Issue AVRO-2051
          GITHUB PR https://github.com/apache/avro/pull/236
          Optional Tests buildtest javac
          uname Linux 45aa718b6e74 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 GNU/Linux
          Build tool build
          git revision master / 793178a
          Default Java 1.7.0_111
          buildtest https://builds.apache.org/job/PreCommit-AVRO-Build-TEST/29/artifact/patchprocess/test--lang_java.txt
          modules C: lang/java U: lang/java
          Console output https://builds.apache.org/job/PreCommit-AVRO-Build-TEST/29/console
          Powered by Apache Yetus 0.4.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 2m 6s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 buildtest 0m 0s master passed +1 whitespace 0m 0s The patch has no whitespace issues. -1 buildtest 0m 23s java in the patch failed. 2m 34s Subsystem Report/Notes Docker Client=1.13.1 Server=1.13.1 Image:yetus/avro:793178a JIRA Issue AVRO-2051 GITHUB PR https://github.com/apache/avro/pull/236 Optional Tests buildtest javac uname Linux 45aa718b6e74 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 GNU/Linux Build tool build git revision master / 793178a Default Java 1.7.0_111 buildtest https://builds.apache.org/job/PreCommit-AVRO-Build-TEST/29/artifact/patchprocess/test--lang_java.txt modules C: lang/java U: lang/java Console output https://builds.apache.org/job/PreCommit-AVRO-Build-TEST/29/console Powered by Apache Yetus 0.4.0 http://yetus.apache.org This message was automatically generated.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user dkulp closed the pull request at:

          https://github.com/apache/avro/pull/236

          Show
          githubbot ASF GitHub Bot added a comment - Github user dkulp closed the pull request at: https://github.com/apache/avro/pull/236
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user dkulp opened a pull request:

          https://github.com/apache/avro/pull/245

          AVRO-2051 Remove synchronization for JsonProperties.getJsonProp

          This change does two basic things:

          Makes "props" a private field and requires the subclasses to access it via the additional methods. This allows some changing of the underlying implementation a bit easier.

          Change props to an ConcurrentHashMap with an extra ConcurrentLinkedQueue to maintain order. With the (1) change, this is fairly simple. This keeps the O(log N) puts and lookups and a simple entrySet iterator for quick iterations. Synchronized blocks are removed.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/dkulp/avro AVRO-2051

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/avro/pull/245.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #245


          commit ef757c5aefd344a32e100736d419b4717d4dfd9f
          Author: Daniel Kulp <dkulp@apache.org>
          Date: 2017-07-17T19:08:10Z

          AVRO-2051 Remove synchronization for JsonProperties.getJsonProp


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user dkulp opened a pull request: https://github.com/apache/avro/pull/245 AVRO-2051 Remove synchronization for JsonProperties.getJsonProp This change does two basic things: Makes "props" a private field and requires the subclasses to access it via the additional methods. This allows some changing of the underlying implementation a bit easier. Change props to an ConcurrentHashMap with an extra ConcurrentLinkedQueue to maintain order. With the (1) change, this is fairly simple. This keeps the O(log N) puts and lookups and a simple entrySet iterator for quick iterations. Synchronized blocks are removed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dkulp/avro AVRO-2051 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/avro/pull/245.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #245 commit ef757c5aefd344a32e100736d419b4717d4dfd9f Author: Daniel Kulp <dkulp@apache.org> Date: 2017-07-17T19:08:10Z AVRO-2051 Remove synchronization for JsonProperties.getJsonProp
          Hide
          dkulp Daniel Kulp added a comment -

          Redid the pull request to base off a branch so I can keep my fork more in sync.....

          Anyway, I made the hashCode and propsEquals package protected which should keep those from being exposed. I needed to add these methods as the subclass (Protocol) needs to know those, but I wanted to keep the "props" field private to make sure any subclass or other does not do any direct access to the props field. This forces subclasses (and others) to use the addProp method where we can make sure properties don't "change". (Properties can only be added, not modified)

          I don't think putting the ConcurrentHashMap into utils makes sense. This is a VERY particular use case and only certain methods are implemented for it. By keeping it anonymous and limiting access (via a private field), we can make sure the proper methods are called. For example, for a proper public class, keySet() would need to be implemented to return the keys in the proper order. I didn't implement it as, in this case, it's not ever called. Likewise, clear/remove/etc... are not implemented as they are not needed in this case. (the comment in the code mentions this)

          Show
          dkulp Daniel Kulp added a comment - Redid the pull request to base off a branch so I can keep my fork more in sync..... Anyway, I made the hashCode and propsEquals package protected which should keep those from being exposed. I needed to add these methods as the subclass (Protocol) needs to know those, but I wanted to keep the "props" field private to make sure any subclass or other does not do any direct access to the props field. This forces subclasses (and others) to use the addProp method where we can make sure properties don't "change". (Properties can only be added, not modified) I don't think putting the ConcurrentHashMap into utils makes sense. This is a VERY particular use case and only certain methods are implemented for it. By keeping it anonymous and limiting access (via a private field), we can make sure the proper methods are called. For example, for a proper public class, keySet() would need to be implemented to return the keys in the proper order. I didn't implement it as, in this case, it's not ever called. Likewise, clear/remove/etc... are not implemented as they are not needed in this case. (the comment in the code mentions this)

            People

            • Assignee:
              Unassigned
              Reporter:
              dkulp Daniel Kulp
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:

                Development