[HDDS-6614] EC: Fix Datanode block file INCONSISTENCY during heavy load. - ASF JIRA

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
- pull-request-available

Description

We are stressing an Ozone cluster with heavy EC writes and reads, and we get one and only one exception from one DN:

2022-04-14 15:02:04,422 [ChunkReader-46] INFO org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler: Operation: WriteChunk , Trace ID:  , Message: Chunk file offset 1048576 does not match blockFile length 0 , Result: CHUNK_FILE_INCONSISTENCY , StorageContainerException Occurred.
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: Chunk file offset 1048576 does not match blockFile length 0
        at org.apache.hadoop.ozone.container.keyvalue.helpers.ChunkUtils.validateChunkSize(ChunkUtils.java:391)
        at org.apache.hadoop.ozone.container.keyvalue.impl.FilePerBlockStrategy.writeChunk(FilePerBlockStrategy.java:140)
        at org.apache.hadoop.ozone.container.keyvalue.impl.ChunkManagerDispatcher.writeChunk(ChunkManagerDispatcher.java:73)
        at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleWriteChunk(KeyValueHandler.java:705)
        at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.dispatchRequest(KeyValueHandler.java:224)
        at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:188)
        at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:307)
        at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$14(HddsDispatcher.java:169)
        at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
        at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:168)
        at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:57)
        at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:50)
        at org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:255)
        at org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
        at org.apache.hadoop.hdds.tracing.GrpcServerInterceptor$1.onMessage(GrpcServerInterceptor.java:49)
        at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:309)
        at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:292)
        at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:782)
        at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
        at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2022-04-14 15:02:04,441 [ChunkReader-46] WARN org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer: Moving container /data20/ozone/hdds/CID-ea6bec06-3f65-4721-a884-b3b9293f6a55/current/containerDir1/614 to state UNHEALTHY from state:UNHEALTHY Trace:java.lang.Thread.getStackTrace(Thread.java:1559)

And this results in an UNHEALTHY container replica:

[ozoneadmin@TENCENT64 ~/ozone-1.3.0-SNAPSHOT]$ ./bin/ozone admin container info 614
Container id: 614
Pipeline id: bddd5494-af98-46b2-a4c3-6757f129d3d2
Container State: CLOSED
Datanodes: [...]
Replicas: [State: CLOSED; ReplicaIndex: 7; Origin: d4a6d2fb-3cc3-4e31-a9a6-7c3cda56c7ad; Location: d4a6d2fb-3cc3-4e31-a9a6-7c3cda56c7ad/9.27.141.78,
State: CLOSED; ReplicaIndex: 10; Origin: 45784b6a-06d7-4d76-b53e-1bc10a7e22ea; Location: 45784b6a-06d7-4d76-b53e-1bc10a7e22ea/9.27.142.97,
State: CLOSED; ReplicaIndex: 9; Origin: 65024339-4831-43c0-9fc0-752f8999d9c4; Location: 65024339-4831-43c0-9fc0-752f8999d9c4/9.27.140.135,
State: CLOSED; ReplicaIndex: 8; Origin: e963db9e-44ea-425c-92e3-bdb5305c3049; Location: e963db9e-44ea-425c-92e3-bdb5305c3049/9.27.142.51,
State: CLOSED; ReplicaIndex: 6; Origin: b6115b17-827c-484f-b61a-5f5849c5dbdb; Location: b6115b17-827c-484f-b61a-5f5849c5dbdb/9.27.141.121,
State: UNHEALTHY; ReplicaIndex: 12; Origin: ae31f944-b6c9-4e18-b677-0b0edb2a8e13; Location: ae31f944-b6c9-4e18-b677-0b0edb2a8e13/9.27.141.78,
State: CLOSED; ReplicaIndex: 2; Origin: 4a6af5ae-097a-4daa-9390-b0fbe87446f1; Location: 4a6af5ae-097a-4daa-9390-b0fbe87446f1/9.27.144.159,
State: CLOSED; ReplicaIndex: 3; Origin: 3c3bc79e-3b5f-4f73-b502-7778a7993db6; Location: 3c3bc79e-3b5f-4f73-b502-7778a7993db6/9.27.140.38,
State: CLOSED; ReplicaIndex: 14; Origin: 37db964c-0663-4b7c-915c-eaffccc32f55; Location: 37db964c-0663-4b7c-915c-eaffccc32f55/9.27.142.51,
State: CLOSED; ReplicaIndex: 11; Origin: dc262c7c-184e-4f66-af98-2ece21deeab1; Location: dc262c7c-184e-4f66-af98-2ece21deeab1/9.27.140.222,
State: CLOSED; ReplicaIndex: 13; Origin: e7ea47c7-6d33-490a-9d3a-3c8231c81e30; Location: e7ea47c7-6d33-490a-9d3a-3c8231c81e30/9.27.144.169,
State: CLOSED; ReplicaIndex: 1; Origin: 8333d554-2e39-4ef6-b73a-01539e82a9d1; Location: 8333d554-2e39-4ef6-b73a-01539e82a9d1/9.27.144.226,
State: CLOSED; ReplicaIndex: 4; Origin: 674aa6fb-4f27-4358-b7a7-af2bbf2683ae; Location: 674aa6fb-4f27-4358-b7a7-af2bbf2683ae/9.27.141.230,
State: CLOSED; ReplicaIndex: 5; Origin: 0d989a0e-3bb4-49ea-bacc-d468729bd9fa; Location: 0d989a0e-3bb4-49ea-bacc-d468729bd9fa/9.27.140.135]

From the log, we could see that a block file received a write request at offset 1048576 while the block file is of length 0.

This should be considered insane because EC client write stripes one by one, it doesn't send the next stripe chunks to DNs before we get success responses from DNs for the previous stripe.

And I checked the container directory for the block file, it has length 1048576 exactly, not 0:

[ozoneadmin@TENCENT64 ~]$ ls -t /data20/ozone/hdds/CID-ea6bec06-3f65-4721-a884-b3b9293f6a55/current/containerDir1/614/chunks/ -l
total 1468508
-rw-rw-r-- 1 ozoneadmin ozoneadmin  40894464 Apr 14 15:02 109611004723334158.block
-rw-rw-r-- 1 ozoneadmin ozoneadmin  41943040 Apr 14 15:02 109611004723334160.block
-rw-rw-r-- 1 ozoneadmin ozoneadmin  39845888 Apr 14 15:02 109611004723334157.block
-rw-rw-r-- 1 ozoneadmin ozoneadmin   1048576 Apr 14 15:02 109611004723334165.block       <-- the target block file

So I suspect that the problem lies in the check of the DN itself.

public static void validateChunkSize(File chunkFile, ChunkInfo chunkInfo)
    throws StorageContainerException {
  long offset = chunkInfo.getOffset();
  long len = chunkFile.length();
  if (chunkFile.length() != offset) {
    throw new StorageContainerException(
        "Chunk file offset " + offset + " does not match blockFile length " +
        len, CHUNK_FILE_INCONSISTENCY);
  }
}

Here we get the length of the existing block file by using `chunkFile.length()` which is suspicious since we do write to block files by using FileChannels, so I think we should get the file size by the channel too because the channel may haven't create the block file yet when we do `chunkFile.length()`, thus we may get a 0. This may be the cause of this problem.

It is hard to reproduce this problem with simple tests, it only reveals on heavy loads, but I did a small experiment to show the different between `fileChannel.size()` vs `file.length()`, the small program could be seen in the attachments.

From the experiment results, I see that when `fileChannel.size` = 1MB, we get `file.length()` = 0, which is exactly the case that we hit here.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

FileChannelSize.java
20/Apr/22 07:58
6 kB
Mark Gui

Issue Links

links to

GitHub Pull Request #3323

EC: Fix Datanode block file INCONSISTENCY during heavy load.

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates