Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.16.0
-
None
-
None
Description
Extract from FLINK-26835
While investigating this issue, we have found that probably state backends are also using non-thread safe serialisers from different threads.
For example: RocksFullSnapshotStrategy#syncPrepareResources is passing keySerializer from the task thread, to the async thread in order to serialize the serializer itself. RocksIncrementalSnapshotStrategy.RocksDBIncrementalSnapshotOperation#materializeMetaData seems to be doing the same thing. If PojoSerializer is used as keySerializer I think this will lead to the same problems as above. Iterating through the PojoSerializer#subclassSerializerCache from the the async checkpoint thread, while the map can be changed from the task thread. It looks like in all of those places the serializer should have been duplicated (#duplicate) before being passed to another thread. Maybe this should happen in RocksDBSnapshotStrategyBase. I don't know about other state backends.
=======
TODO:
I second that each thread obtains ownership of the Serializer passed in itself.
- Figure out whether the current way of passing Serializer really causing problems in the state backend (concurrent modification possible).
- What other places have Serializer directly passed between threads.
Attachments
Issue Links
- is related to
-
FLINK-26835 WordCountSubclassPOJOITCase failed on azure
- Closed