Uploaded image for project: 'Hadoop Distributed Data Store'
  1. Hadoop Distributed Data Store
  2. HDDS-2026

Overlapping chunk region cannot be read concurrently

    XMLWordPrintableJSON

    Details

      Description

      Concurrent requests to datanode for the same chunk may result in the following exception in datanode:

      java.nio.channels.OverlappingFileLockException
         at java.base/sun.nio.ch.FileLockTable.checkList(FileLockTable.java:229)
         at java.base/sun.nio.ch.FileLockTable.add(FileLockTable.java:123)
         at java.base/sun.nio.ch.AsynchronousFileChannelImpl.addToFileLockTable(AsynchronousFileChannelImpl.java:178)
         at java.base/sun.nio.ch.SimpleAsynchronousFileChannelImpl.implLock(SimpleAsynchronousFileChannelImpl.java:185)
         at java.base/sun.nio.ch.AsynchronousFileChannelImpl.lock(AsynchronousFileChannelImpl.java:118)
         at org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.readData(ChunkUtils.java:175)
         at org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerImpl.readChunk(ChunkManagerImpl.java:213)
         at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleReadChunk(KeyValueHandler.java:574)
         at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:195)
         at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:271)
         at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148)
         at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:73)
         at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:61)
      

      It seems this is covered by retry logic, as key read is eventually successful at client side.

      The problem is that:

      File locks are held on behalf of the entire Java virtual machine. They are not suitable for controlling access to a file by multiple threads within the same virtual machine. (source)

      code ref: ChunkUtils.readData

        Attachments

        1. changes.diff
          3 kB
          Anu Engineer
        2. first-cut-proposed.diff
          3 kB
          Anu Engineer
        3. HDDS-2026-repro.patch
          4 kB
          Attila Doroszlai

          Issue Links

            Activity

              People

              • Assignee:
                adoroszlai Attila Doroszlai
                Reporter:
                adoroszlai Attila Doroszlai
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 10m
                  2h 10m