Affects Version/s: 0.6, 0.6.1, 0.7, 0.8, 0.9, 0.9.1, 0.9.2, 0.9.3, 0.9.3.1, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.14.0, 0.14.1
Fix Version/s: 0.15.0
Component/s: Java - Library
AdoptJDK 1.8 (Java 8) ; Linux, AWS machines.
Deadlock is environment independent and occurs in conformance with the JRE 8 spec.
Patch Info:Patch Available
We (Pinterest) hit the following deadlock during JVM classloading of Thrift classes. The root cause was triggering sClass.newInstance() while holding the synchronized lock on FieldMetaData.class::getStructMetaDataMap(..)
Here's the stacktraces of interest:
Here's the code of interest:
Thread 1 has the following lock acquisition order:
Thread 2 has the following lock acquisition order:
Internally, this was a fairly detailed investigation. Would be great if we agree on a Fix approach. This deadlock affects all versions of Thrift since 0.9 which introduced the synchronized keyword. Versions prior to 0.9 are simply thread unsafe (because there's no lock on FieldMetaData's internal HashMap and the two static method calls can race). I didn't check versions below 0.5 but probably this extends all the way back.
This is not an issue in fbThrift, which correctly uses a ConcurrentHashMap and does not take a lock on FieldMetaData. See this code here:
Another alternative approach is to use
instead of the synchronized keyword on the methods.
Please confirm which approach is preferred and we can send a PR on Github.
Here's the original deadlock hypothesis as outlined by Jiahuan Liu:
The hypothesis of the deadlock:
ThriftStructDescriptor.get is used in many places for deserialization and it calls FieldMetaData.getStructMetaDataMap. inside this method, it tries to load the thrift class if the thrift has not been loaded yet. thrift classes call FieldMetaData.addStructMetaDataMap in a static block. so the deadlock may happen when:
- thread A is trying to deserialize -> getStructMetaDataMap -> class loading
- if thread B is already trying to load the same thrift, it requires addStructMetaDataMap to complete but this method is blocked by thread A getStructMetaDataMap
- the class loading in thread A will be blocked by thread B because class loading is also synchronized by java. see this section in java spec:
If the Class object for C indicates that initialization is in progress for C by some other thread, then release LC and block the current thread until informed that the in-progress initialization has completed, at which time repeat this step.
Here's the classloading spec in java 8:
Thread 1 is stuck in step 2 of the initialization (waiting for Thread 2 to notify in Step 10).
Thread 2 is stuck in step 9 of the initialization and never makes it to step 10.
Precondition: this is the first object creation for that Thrift object and at-least 2 threads are racing to create the object, one of which holds a lock on FieldMetaData.class.
Please reach out if this is unclear. This hypothesis was validated by reading through the native JVM code on the class load -> static_initialization path.
Here's the native JVM code if you're interested: InstanceKlass::initialize_impl