Thanks a lot for the great comments, Andrew Wang! Let me try to answer some of the questions here, and I believe Tsz Wo Nicholas Sze will provide more details later.
When does the Mover actually migrate data? When a block is finalized? When the file is closed? Some amount of time after? When the admin decides to run the Mover?
Currently the data is only migrated when admin runs the Mover.
What is the load impact of scanning the namespace for files that need to be migrated? A naive ls -R / type operation could be bad.
Yeah, scanning the namespace is definitely a big burden here.
HDFS-6875 adds the support to allow users to specify a list of paths for migration. And in the future we may want to support running multiple Movers for disjoint directories concurrently or even utilizing MR.
Why are policies specified in XML files rather than in the fsimage / edit log? It seems very important to keep the policies consistent, and this is thus one more file that needs to be synchronized and backed up. Stashing it in the editlog would do this for you.
Agree. Actually Nicholas and I had a discussion about this before, and I had a unfinished preliminary patch but still need to think more about some details. We plan to finish this work after the merge.
Can storage policies be set at a directory level? Testing to confirm this either way?
Yes, this has been done in
How does this interact with snapshots? With replication factor, I believe we use the maximum replication factor across all snapshots. Here, would it be the union of all storage types across all snapshots? Not sure how the Mover accounts for this, or if a full-union is the right policy.
This has been addressed in
HDFS-6969. Please see the discussion there.
Do we have per-storage-type quotas? Are there APIs exposed to show, for instance, storage type usage by a snapshot, by a directory, etc?
This is a very good suggestion, especially considering we also have storage type SSD and in the future we may also have storage type MEMORY.
How does this interact with open files?
Actually we should ignore the incomplete block which can be inferred from LocatedBlocks. I will file a new jira for this. Thanks!
In another scenario, if a block later gets appended during the migration, the new replica will be marked as corrupted when it is reported to the NN because of the inconsistency of generation stamp.