We have two approaches:
1. The client is unaware of how much data can go into the pre-existing last crc chunk. The client buffers (as usual) all new data written by the application, when a crc chunk is full, it sends it to the datanode(s). The datanode(s) know that part of this newly arriving chunk has to be appended to the last partial crc chunk that already existed on disk. It reads the last partial crc chunk from disk, appends however much of new data can be filled up into that crc chunk and writes the crc chunk back. This logic need to be executed only by the primary (first) datanode in the pipeline.
The advantage of this approach is that multiple appenders can be supported in future. The disadvantage of this approach is that the crc has to be computed once by the client and again by the primary datanode.
2. The second approach would be such that the client fetches the contents of the last crc chunk from the datanode (and buffers it) when the file is first opened for append. It then appends newly written data to this buffered chunk. When the chink is full, it sends it to the datanode pipeline.
The advantage of this approach is that crcs do not need to be generated at two places. It can be generated only by the client. The disadvantage of this approach is that supporting multiple concurrent appenders is going to be infeasible.