Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
Description
Concurrent requests to datanode for the same chunk may result in the following exception in datanode:
java.nio.channels.OverlappingFileLockException at java.base/sun.nio.ch.FileLockTable.checkList(FileLockTable.java:229) at java.base/sun.nio.ch.FileLockTable.add(FileLockTable.java:123) at java.base/sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178) at java.base/sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185) at java.base/sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118) at org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:175) at org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:213) at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:574) at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:195) at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:271) at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148) at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:73) at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:61)
It seems this is covered by retry logic, as key read is eventually successful at client side.
The problem is that:
File locks are held on behalf of the entire Java virtual machine. They are not suitable for controlling access to a file by multiple threads within the same virtual machine. (source)
code ref: ChunkUtils.readData
Attachments
Attachments
Issue Links
- breaks
-
HDDS-4970 Significant overhead when DataNode is over-subscribed
- Resolved
- links to