> User can go to reference link to understand what is checkpoint and what does this command do
The provided link is to the HDFS architecture guide. In that guide, if you search for "trash", it tells you that deleting a file will actually move it into the trash. True, but not helpful. If you search for "checkpoint", it tells you about the NN's edit log checkpointing. Also true, but not helpful. The only thing I was able to find that clarified in specific terms what a trash checkpoint is, is the source code. Googling around a bit, there are a couple of forum and blog posts here and there that do explain bits of how the trash works in HDFS, and taken together you get a pretty clear picture, but that's a handful of hits in a sea of "it empties the trash."
My point is that this is an excellent opportunity to create a useful source of documentation on what happens when you use -expunge. I think these docs would be much more helpful if they said something like:
- The trash folder is divided into "checkpoints" that contain the files deleted during given time windows
- Every fs.trash.checkpoint.interval minutes, HDFS will create a new checkpoint, and all files subsequently deleted will go there
- Every fs.trash.interval minutes, HDFS will delete all checkpoints older than fs.trash.interval and then create a new checkpoint
- hdfs -expunge will causes HSDS to delete all checkpoints older than fs.trash.interval
I didn't think too hard about the phrasing, but you get my point. Provide enough information that a user can understand what a checkpoint is and why they'd want to expunge one without having to go on a Googlequest or read source code.