Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
We are stressing an Ozone cluster with heavy EC writes and reads, and we get one and only one exception from one DN:
2022-04-14 15:02:04,422 [ChunkReader-46] INFO org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Operation: WriteChunk , Trace ID: , Message: Chunk file offset 1048576 does not match blockFile length 0 , Result: CHUNK_FILE_INCONSISTENCY , StorageContainerException Occurred. org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: Chunk file offset 1048576 does not match blockFile length 0 at org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.validateChunkSize(ChunkUtils.java:391) at org.apache.hadoop.ozone.container.keyvalue.impl.FilePerBlockStrategy.writeChunk(FilePerBlockStrategy.java:140) at org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerDispatcher.writeChunk(ChunkManagerDispatcher.java:73) at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleWriteChunk(KeyValueHandler.java:705) at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.dispatchRequest(KeyValueHandler.java:224) at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:188) at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:307) at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$14(HddsDispatcher.java:169) at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87) at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:168) at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:57) at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:50) at org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:255) at org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33) at org.apache.hadoop.hdds.tracing.GrpcServerInterceptor$1.onMessage(GrpcServerInterceptor.java:49) at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:309) at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:292) at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:782) at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2022-04-14 15:02:04,441 [ChunkReader-46] WARN org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer: Moving container /data20/ozone/hdds/CID-ea6bec06-3f65-4721-a884-b3b9293f6a55/current/containerDir1/614 to state UNHEALTHY from state:UNHEALTHY Trace:java.lang.Thread.getStackTrace(Thread.java:1559)
And this results in an UNHEALTHY container replica:
[ozoneadmin@TENCENT64 ~/ozone-1.3.0-SNAPSHOT]$ ./bin/ozone admin container info 614 Container id: 614 Pipeline id: bddd5494-af98-46b2-a4c3-6757f129d3d2 Container State: CLOSED Datanodes: [...] Replicas: [State: CLOSED; ReplicaIndex: 7; Origin: d4a6d2fb-3cc3-4e31-a9a6-7c3cda56c7ad; Location: d4a6d2fb-3cc3-4e31-a9a6-7c3cda56c7ad/9.27.141.78, State: CLOSED; ReplicaIndex: 10; Origin: 45784b6a-06d7-4d76-b53e-1bc10a7e22ea; Location: 45784b6a-06d7-4d76-b53e-1bc10a7e22ea/9.27.142.97, State: CLOSED; ReplicaIndex: 9; Origin: 65024339-4831-43c0-9fc0-752f8999d9c4; Location: 65024339-4831-43c0-9fc0-752f8999d9c4/9.27.140.135, State: CLOSED; ReplicaIndex: 8; Origin: e963db9e-44ea-425c-92e3-bdb5305c3049; Location: e963db9e-44ea-425c-92e3-bdb5305c3049/9.27.142.51, State: CLOSED; ReplicaIndex: 6; Origin: b6115b17-827c-484f-b61a-5f5849c5dbdb; Location: b6115b17-827c-484f-b61a-5f5849c5dbdb/9.27.141.121, State: UNHEALTHY; ReplicaIndex: 12; Origin: ae31f944-b6c9-4e18-b677-0b0edb2a8e13; Location: ae31f944-b6c9-4e18-b677-0b0edb2a8e13/9.27.141.78, State: CLOSED; ReplicaIndex: 2; Origin: 4a6af5ae-097a-4daa-9390-b0fbe87446f1; Location: 4a6af5ae-097a-4daa-9390-b0fbe87446f1/9.27.144.159, State: CLOSED; ReplicaIndex: 3; Origin: 3c3bc79e-3b5f-4f73-b502-7778a7993db6; Location: 3c3bc79e-3b5f-4f73-b502-7778a7993db6/9.27.140.38, State: CLOSED; ReplicaIndex: 14; Origin: 37db964c-0663-4b7c-915c-eaffccc32f55; Location: 37db964c-0663-4b7c-915c-eaffccc32f55/9.27.142.51, State: CLOSED; ReplicaIndex: 11; Origin: dc262c7c-184e-4f66-af98-2ece21deeab1; Location: dc262c7c-184e-4f66-af98-2ece21deeab1/9.27.140.222, State: CLOSED; ReplicaIndex: 13; Origin: e7ea47c7-6d33-490a-9d3a-3c8231c81e30; Location: e7ea47c7-6d33-490a-9d3a-3c8231c81e30/9.27.144.169, State: CLOSED; ReplicaIndex: 1; Origin: 8333d554-2e39-4ef6-b73a-01539e82a9d1; Location: 8333d554-2e39-4ef6-b73a-01539e82a9d1/9.27.144.226, State: CLOSED; ReplicaIndex: 4; Origin: 674aa6fb-4f27-4358-b7a7-af2bbf2683ae; Location: 674aa6fb-4f27-4358-b7a7-af2bbf2683ae/9.27.141.230, State: CLOSED; ReplicaIndex: 5; Origin: 0d989a0e-3bb4-49ea-bacc-d468729bd9fa; Location: 0d989a0e-3bb4-49ea-bacc-d468729bd9fa/9.27.140.135]
From the log, we could see that a block file received a write request at offset 1048576 while the block file is of length 0.
This should be considered insane because EC client write stripes one by one, it doesn't send the next stripe chunks to DNs before we get success responses from DNs for the previous stripe.
And I checked the container directory for the block file, it has length 1048576 exactly, not 0:
[ozoneadmin@TENCENT64 ~]$ ls -t /data20/ozone/hdds/CID-ea6bec06-3f65-4721-a884-b3b9293f6a55/current/containerDir1/614/chunks/ -l total 1468508 -rw-rw-r-- 1 ozoneadmin ozoneadmin 40894464 Apr 14 15:02 109611004723334158.block -rw-rw-r-- 1 ozoneadmin ozoneadmin 41943040 Apr 14 15:02 109611004723334160.block -rw-rw-r-- 1 ozoneadmin ozoneadmin 39845888 Apr 14 15:02 109611004723334157.block -rw-rw-r-- 1 ozoneadmin ozoneadmin 1048576 Apr 14 15:02 109611004723334165.block <-- the target block file
So I suspect that the problem lies in the check of the DN itself.
public static void validateChunkSize(File chunkFile, ChunkInfo chunkInfo) throws StorageContainerException { long offset = chunkInfo.getOffset(); long len = chunkFile.length(); if (chunkFile.length() != offset) { throw new StorageContainerException( "Chunk file offset " + offset + " does not match blockFile length " + len, CHUNK_FILE_INCONSISTENCY); } }
Here we get the length of the existing block file by using `chunkFile.length()` which is suspicious since we do write to block files by using FileChannels, so I think we should get the file size by the channel too because the channel may haven't create the block file yet when we do `chunkFile.length()`, thus we may get a 0. This may be the cause of this problem.
It is hard to reproduce this problem with simple tests, it only reveals on heavy loads, but I did a small experiment to show the different between `fileChannel.size()` vs `file.length()`, the small program could be seen in the attachments.
From the experiment results, I see that when `fileChannel.size` = 1MB, we get `file.length()` = 0, which is exactly the case that we hit here.
Attachments
Attachments
Issue Links
- links to