Uploaded image for project: 'Avro'
  1. Avro
  2. AVRO-2051

Thread contention accessing JsonProperties props

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.8.2
    • Fix Version/s: None
    • Component/s: java
    • Labels:
      None
    • Flags:
      Patch

      Description

      See https://lists.apache.org/thread.html/dd34ab8439137a81a9de29ad4161f37b17638394cea0806765689976@%3Cuser.avro.apache.org%3E

      Basically, the getJsonProp method, being synchronized, is causing thread contention issues when trying to share schemas between threads. My proposal (pull request forthcoming) is to treat "props" as an immutable map and do a copy+add+swap for the addProp method. This will make the addProp call slower (particularly for large maps of props), but would make the reads significantly faster as no locking will be needed.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user dkulp opened a pull request:

          https://github.com/apache/avro/pull/236

          AVRO-2051 Remove synchronization for JsonProperties.getJsonProp

          This change does two basic things:

          1) Makes "props" a private field and requires the subclasses to access it via the additional methods. This allows some changing of the underlying implementation a bit easier.

          2) Change props to an AtomicReference and makes it act like an immutable map. The addProp method does a full copy of the map, adds the new value, and then atomicly swaps in the map thus not affecting other threads that would be using the value that was "current" when they called the get method.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/dkulp/avro master

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/avro/pull/236.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #236


          commit ad14635fa3af97b90282a79b7e04a0b8753e45b5
          Author: Daniel Kulp <dkulp@apache.org>
          Date: 2017-07-17T19:08:10Z

          AVRO-2051 Remove synchronization for JsonProperties.getJsonProp


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user dkulp opened a pull request: https://github.com/apache/avro/pull/236 AVRO-2051 Remove synchronization for JsonProperties.getJsonProp This change does two basic things: 1) Makes "props" a private field and requires the subclasses to access it via the additional methods. This allows some changing of the underlying implementation a bit easier. 2) Change props to an AtomicReference and makes it act like an immutable map. The addProp method does a full copy of the map, adds the new value, and then atomicly swaps in the map thus not affecting other threads that would be using the value that was "current" when they called the get method. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dkulp/avro master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/avro/pull/236.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #236 commit ad14635fa3af97b90282a79b7e04a0b8753e45b5 Author: Daniel Kulp <dkulp@apache.org> Date: 2017-07-17T19:08:10Z AVRO-2051 Remove synchronization for JsonProperties.getJsonProp
          Hide
          cutting Doug Cutting added a comment -

          This can make building schemas quadratic in the number of properties, no? While for most schemas this is probably not an issue, for some it might significantly impact performance.

          I think instead we should just bite the bullet and make Schema immutable, eliminating the addProp method altogether. At the same time, we should stop exposing JsonNode in the public API, instead using only Object, as intended in AVRO-1585.

          Show
          cutting Doug Cutting added a comment - This can make building schemas quadratic in the number of properties, no? While for most schemas this is probably not an issue, for some it might significantly impact performance. I think instead we should just bite the bullet and make Schema immutable, eliminating the addProp method altogether. At the same time, we should stop exposing JsonNode in the public API, instead using only Object, as intended in AVRO-1585 .
          Hide
          dkulp Daniel Kulp added a comment -

          I'm trying to find something that will work for Avro 1.8.x as that's what we'll need. Thus, removing all of that is likely not an option.

          That said, I just discovered that we already have parts of guava shaded in as a dependency. Thus, I believe I can use the CacheBuilder to create the equivalent of a "ConcurrentLinkedHashMap" (there are some google links that mention this) that would work for this and not have the quadratic issue. I'll investigate more tomorrow. Another option would be to either add a dependency to something else (like caffeine) that has a ConcurrentLinkedHashMap or copy/shade an Apache licensed version (like https://github.com/ben-manes/concurrentlinkedhashmap/blob/master/src/main/java/com/googlecode/concurrentlinkedhashmap/ConcurrentLinkedHashMap.java) into the src and use it.

          Show
          dkulp Daniel Kulp added a comment - I'm trying to find something that will work for Avro 1.8.x as that's what we'll need. Thus, removing all of that is likely not an option. That said, I just discovered that we already have parts of guava shaded in as a dependency. Thus, I believe I can use the CacheBuilder to create the equivalent of a "ConcurrentLinkedHashMap" (there are some google links that mention this) that would work for this and not have the quadratic issue. I'll investigate more tomorrow. Another option would be to either add a dependency to something else (like caffeine) that has a ConcurrentLinkedHashMap or copy/shade an Apache licensed version (like https://github.com/ben-manes/concurrentlinkedhashmap/blob/master/src/main/java/com/googlecode/concurrentlinkedhashmap/ConcurrentLinkedHashMap.java ) into the src and use it.
          Hide
          dkulp Daniel Kulp added a comment - - edited

          Of course another option is to just surround the access to the props with a ReentrantReadWriteLock..... Bunch of ideas to test and benchmark.

          Show
          dkulp Daniel Kulp added a comment - - edited Of course another option is to just surround the access to the props with a ReentrantReadWriteLock..... Bunch of ideas to test and benchmark.
          Hide
          dkulp Daniel Kulp added a comment -


          I just updated the pull request. Basically, since the JsonProperties props field is only ever iterated across and added to (values never change or are removed), it was simple enough to create a ConcurrentHashMap subclass that can also maintain a linked list type thing to maintain the order. This keeps the O(log N) lookups, O(log N) inserts, and O(N) iterator.

          Show
          dkulp Daniel Kulp added a comment - I just updated the pull request. Basically, since the JsonProperties props field is only ever iterated across and added to (values never change or are removed), it was simple enough to create a ConcurrentHashMap subclass that can also maintain a linked list type thing to maintain the order. This keeps the O(log N) lookups, O(log N) inserts, and O(N) iterator.
          Hide
          cutting Doug Cutting added a comment -

          Should we perhaps instead put the ConcurrentLinkedHashMap into the org.apache.avro.util package, rather than implementing this as a nested utility class?

          Also, I'm not sure why you added methods for the hashCode, equals and hasProps, but I question whether these should be protected (where they appear in javadoc for subclasses to override) or rather package private (so that they can only be used by other classes in the package). I generally think we should avoid adding publicly visible methods that we don't expect things outside the package to ever need to call. The Schema classes are not meant to be extensible. Perhaps this should at least be a separate issue, since it's not related to the thread contention issue?

          Lastly, it would be good to add a test that verifys the thread contention has been addressed. This might be as simple as a synchronized(schema) thread that blocks another thread that tries to change properties without the patch and does not with the patch.

          Show
          cutting Doug Cutting added a comment - Should we perhaps instead put the ConcurrentLinkedHashMap into the org.apache.avro.util package, rather than implementing this as a nested utility class? Also, I'm not sure why you added methods for the hashCode, equals and hasProps, but I question whether these should be protected (where they appear in javadoc for subclasses to override) or rather package private (so that they can only be used by other classes in the package). I generally think we should avoid adding publicly visible methods that we don't expect things outside the package to ever need to call. The Schema classes are not meant to be extensible. Perhaps this should at least be a separate issue, since it's not related to the thread contention issue? Lastly, it would be good to add a test that verifys the thread contention has been addressed. This might be as simple as a synchronized(schema) thread that blocks another thread that tries to change properties without the patch and does not with the patch.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 2m 6s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 buildtest 0m 0s master passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          -1 buildtest 0m 23s java in the patch failed.
          2m 34s



          Subsystem Report/Notes
          Docker Client=1.13.1 Server=1.13.1 Image:yetus/avro:793178a
          JIRA Issue AVRO-2051
          GITHUB PR https://github.com/apache/avro/pull/236
          Optional Tests buildtest javac
          uname Linux 45aa718b6e74 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 GNU/Linux
          Build tool build
          git revision master / 793178a
          Default Java 1.7.0_111
          buildtest https://builds.apache.org/job/PreCommit-AVRO-Build-TEST/29/artifact/patchprocess/test--lang_java.txt
          modules C: lang/java U: lang/java
          Console output https://builds.apache.org/job/PreCommit-AVRO-Build-TEST/29/console
          Powered by Apache Yetus 0.4.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 2m 6s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 buildtest 0m 0s master passed +1 whitespace 0m 0s The patch has no whitespace issues. -1 buildtest 0m 23s java in the patch failed. 2m 34s Subsystem Report/Notes Docker Client=1.13.1 Server=1.13.1 Image:yetus/avro:793178a JIRA Issue AVRO-2051 GITHUB PR https://github.com/apache/avro/pull/236 Optional Tests buildtest javac uname Linux 45aa718b6e74 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 GNU/Linux Build tool build git revision master / 793178a Default Java 1.7.0_111 buildtest https://builds.apache.org/job/PreCommit-AVRO-Build-TEST/29/artifact/patchprocess/test--lang_java.txt modules C: lang/java U: lang/java Console output https://builds.apache.org/job/PreCommit-AVRO-Build-TEST/29/console Powered by Apache Yetus 0.4.0 http://yetus.apache.org This message was automatically generated.

            People

            • Assignee:
              Unassigned
              Reporter:
              dkulp Daniel Kulp
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:

                Development