XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None

    Description

      We are stressing an Ozone cluster with heavy EC writes and reads, and we get one and only one exception from one DN:

      2022-04-14 15:02:04,422 [ChunkReader-46] INFO org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Operation: WriteChunk , Trace ID:  , Message: Chunk file offset 1048576 does not match blockFile length 0 , Result: CHUNK_FILE_INCONSISTENCY , StorageContainerException Occurred.
      org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: Chunk file offset 1048576 does not match blockFile length 0
              at org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.validateChunkSize(ChunkUtils.java:391)
              at org.apache.hadoop.ozone.container.keyvalue.impl.FilePerBlockStrategy.writeChunk(FilePerBlockStrategy.java:140)
              at org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerDispatcher.writeChunk(ChunkManagerDispatcher.java:73)
              at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleWriteChunk(KeyValueHandler.java:705)
              at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.dispatchRequest(KeyValueHandler.java:224)
              at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:188)
              at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:307)
              at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$14(HddsDispatcher.java:169)
              at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
              at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:168)
              at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:57)
              at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:50)
              at org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:255)
              at org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
              at org.apache.hadoop.hdds.tracing.GrpcServerInterceptor$1.onMessage(GrpcServerInterceptor.java:49)
              at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:309)
              at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:292)
              at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:782)
              at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
              at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      2022-04-14 15:02:04,441 [ChunkReader-46] WARN org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer: Moving container /data20/ozone/hdds/CID-ea6bec06-3f65-4721-a884-b3b9293f6a55/current/containerDir1/614 to state UNHEALTHY from state:UNHEALTHY Trace:java.lang.Thread.getStackTrace(Thread.java:1559) 

      And this results in an UNHEALTHY container replica:

      [ozoneadmin@TENCENT64 ~/ozone-1.3.0-SNAPSHOT]$ ./bin/ozone admin container info 614
      Container id: 614
      Pipeline id: bddd5494-af98-46b2-a4c3-6757f129d3d2
      Container State: CLOSED
      Datanodes: [...]
      Replicas: [State: CLOSED; ReplicaIndex: 7; Origin: d4a6d2fb-3cc3-4e31-a9a6-7c3cda56c7ad; Location: d4a6d2fb-3cc3-4e31-a9a6-7c3cda56c7ad/9.27.141.78,
      State: CLOSED; ReplicaIndex: 10; Origin: 45784b6a-06d7-4d76-b53e-1bc10a7e22ea; Location: 45784b6a-06d7-4d76-b53e-1bc10a7e22ea/9.27.142.97,
      State: CLOSED; ReplicaIndex: 9; Origin: 65024339-4831-43c0-9fc0-752f8999d9c4; Location: 65024339-4831-43c0-9fc0-752f8999d9c4/9.27.140.135,
      State: CLOSED; ReplicaIndex: 8; Origin: e963db9e-44ea-425c-92e3-bdb5305c3049; Location: e963db9e-44ea-425c-92e3-bdb5305c3049/9.27.142.51,
      State: CLOSED; ReplicaIndex: 6; Origin: b6115b17-827c-484f-b61a-5f5849c5dbdb; Location: b6115b17-827c-484f-b61a-5f5849c5dbdb/9.27.141.121,
      State: UNHEALTHY; ReplicaIndex: 12; Origin: ae31f944-b6c9-4e18-b677-0b0edb2a8e13; Location: ae31f944-b6c9-4e18-b677-0b0edb2a8e13/9.27.141.78,
      State: CLOSED; ReplicaIndex: 2; Origin: 4a6af5ae-097a-4daa-9390-b0fbe87446f1; Location: 4a6af5ae-097a-4daa-9390-b0fbe87446f1/9.27.144.159,
      State: CLOSED; ReplicaIndex: 3; Origin: 3c3bc79e-3b5f-4f73-b502-7778a7993db6; Location: 3c3bc79e-3b5f-4f73-b502-7778a7993db6/9.27.140.38,
      State: CLOSED; ReplicaIndex: 14; Origin: 37db964c-0663-4b7c-915c-eaffccc32f55; Location: 37db964c-0663-4b7c-915c-eaffccc32f55/9.27.142.51,
      State: CLOSED; ReplicaIndex: 11; Origin: dc262c7c-184e-4f66-af98-2ece21deeab1; Location: dc262c7c-184e-4f66-af98-2ece21deeab1/9.27.140.222,
      State: CLOSED; ReplicaIndex: 13; Origin: e7ea47c7-6d33-490a-9d3a-3c8231c81e30; Location: e7ea47c7-6d33-490a-9d3a-3c8231c81e30/9.27.144.169,
      State: CLOSED; ReplicaIndex: 1; Origin: 8333d554-2e39-4ef6-b73a-01539e82a9d1; Location: 8333d554-2e39-4ef6-b73a-01539e82a9d1/9.27.144.226,
      State: CLOSED; ReplicaIndex: 4; Origin: 674aa6fb-4f27-4358-b7a7-af2bbf2683ae; Location: 674aa6fb-4f27-4358-b7a7-af2bbf2683ae/9.27.141.230,
      State: CLOSED; ReplicaIndex: 5; Origin: 0d989a0e-3bb4-49ea-bacc-d468729bd9fa; Location: 0d989a0e-3bb4-49ea-bacc-d468729bd9fa/9.27.140.135] 

      From the log, we could see that a block file received a write request at offset 1048576 while the block file is of length 0.

      This should be considered insane because EC client write stripes one by one, it doesn't send the next stripe chunks to DNs before we get success responses from DNs for the previous stripe.

      And I checked the container directory for the block file, it has length 1048576 exactly, not 0:

      [ozoneadmin@TENCENT64 ~]$ ls -t /data20/ozone/hdds/CID-ea6bec06-3f65-4721-a884-b3b9293f6a55/current/containerDir1/614/chunks/ -l
      total 1468508
      -rw-rw-r-- 1 ozoneadmin ozoneadmin  40894464 Apr 14 15:02 109611004723334158.block
      -rw-rw-r-- 1 ozoneadmin ozoneadmin  41943040 Apr 14 15:02 109611004723334160.block
      -rw-rw-r-- 1 ozoneadmin ozoneadmin  39845888 Apr 14 15:02 109611004723334157.block
      -rw-rw-r-- 1 ozoneadmin ozoneadmin   1048576 Apr 14 15:02 109611004723334165.block       <-- the target block file 

      So I suspect that the problem lies in the check of the DN itself.

      public static void validateChunkSize(File chunkFile, ChunkInfo chunkInfo)
          throws StorageContainerException {
        long offset = chunkInfo.getOffset();
        long len = chunkFile.length();
        if (chunkFile.length() != offset) {
          throw new StorageContainerException(
              "Chunk file offset " + offset + " does not match blockFile length " +
              len, CHUNK_FILE_INCONSISTENCY);
        }
      } 

      Here we get the length of the existing block file by using `chunkFile.length()` which is suspicious since we do write to block files by using FileChannels, so I think we should get the file size by the channel too because the channel may haven't create the block file yet when we do `chunkFile.length()`, thus we may get a 0. This may be the cause of this problem.

      It is hard to reproduce this problem with simple tests, it only reveals on heavy loads, but I did a small experiment to show the different between `fileChannel.size()` vs `file.length()`, the small program could be seen in the attachments.

      From the experiment results, I see that when `fileChannel.size` = 1MB, we get `file.length()` = 0, which is exactly the case that we hit here.

       

      Attachments

        1. FileChannelSize.java
          6 kB
          Mark Gui

        Issue Links

          Activity

            People

              markgui Mark Gui
              markgui Mark Gui
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: