The full truncate feature may be hard to implement. Below are some ideas.
(1) Support only block boundary truncate.
When the length is not a multiple of the block size, throw an exception.
This is very easy to implement. The client could use it to support full truncate by (i ) copy the data in the truncated block, (ii) truncate to block boundary and (iii) append the data back.
Example 1: Suppose file size is 290 and block size is 100. Then, truncate(file, 180) can be done by (i ) reading bytes from position 101 to 180, (ii) truncate(file, 100) and (iii) append the 80 bytes back.
(2) truncate with concat. (Sanjay’s idea)
A problem of (1) is that it is not atomic. It may end up with finishing (ii) but failing at (iii). A remedy is to add a parameter and the API becomes truncate(file, length, concateFile), where length must be a multiple of the block size. HDFS will first truncate file to length and then concatenate the block in concateFile to the end of file. Note that this is a namenode-only operation and can be easily implemented as atomic.
Example 2: For the case in e.g. 1, truncate can be done by first copying bytes 101 to 180 to a new file f and then truncate(file, 100, f).
(3) Copy on truncate.
Support full truncate by first copying the last block to a new block, then truncate the replicas of the new block at the datanodes and then commit the truncate once the datanodes report to new block to the namenode. Rollback is possible since the old block is still around. This is harder to implement than (1) or (2).