Thanks for the comments guys.
In some of the most commercially popular systems which implement snapshots, snapshots do not count against the disk quotas
How do they handle disk quota use when the original file is deleted and only snapshots exists? That is the reason why counting the disk quota makes sense.
First, I'm concerned with the O(# of files + # of directories) nature of this design, both in terms of time taken to create a snapshot and the NN memory resources consumed.
I agree with you on this. We wanted to begin with this approach and then optimize it further in memory. The initial patch uploaded here tried premature optimization both for memory and snapshot creation time and thus made the code really complicated. But this is a definite goal and that part of the design we will update as we continue to work. This is covered in open issues/future work section.
Agree with this part. As we continue the work, we can make a decision on this. For supporting RW, lets not make the design/implementation more complicated.
Will address this as we continue to add more details to the design in the next update.
Comment 3, 6:
I want to make sure you understand this is early design and we will continue to add more details. I think some of the questions will be answered by how this works:
- Admin can mark directories as snapshottable using CLI
- User then can create snapshots for these directories using CLI/API. A snapshot has a snapshot name and it is unique for given snapshot root.
If you look at snapshot implementation in other systems it is done at volume level. That is the parallel we are talking about.
Comment 5, Comment 7, comment 10:
As regards to consistency (comment 7), a system where snapshot is taken at the namespace without involving data layer cannot provide string consistency guarantee. I also think it may not be relevant where writers are different from the client that is taking the snapshot. Not sure what guarantee such a client can expect/depend on given writers are separate. We could discuss this during design review. I also think based on discussion with few HBase folks, they should be okay with it. Some thing to discuss with them. I am also not clear on their dependency on HDFS with hbase-6055.
This could change during implementation if we think access time may not be that important to maintain.
Agreed. I am leaning towards allowing it.
Will add usecases
See the volume comment and the document sort of covers this. We could discuss this further if the document is not clear.