Details
-
Story
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.0.0
-
None
Description
Ambari keeps track of a file, /etc/hadoop/conf/dfs_data_dir_mount.hist
that contains a mapping of HDFS data dirs to the last known mount point.
This is used to detect when a data dir becomes unmounted, in order to prevent HDFS from writing to the root partition.
Consider the example of a data node configured with these volumes:
/dev/sda -> /
/dev/sdb -> /grid/0
/dev/sdc -> /grid/1
/dev/sdd -> /grid/2
Typically, each /grid/#/ directory contains a data folder.
If hdfs-site contains dfs.datanode.failed.volumes.tolerated with a value > 0, then DataNode will tolerate the failure, otherwise, the DataNode will die.
In AMBARI-12252, I fixed a bug so that Ambari would prevent an unmounted drive from allowing HDFS to write to the root partition.
However, this approach relies on the /etc/hadoop/conf/dfs_data_dir_mount.hist file existing, and the original configuration being correct.
The ideal way to fix this is,
Track which data dirs the admin wants mounted on a non-root partition.
If the admin wishes all data dirs to be on non-root mounts, but the initial install is incorrect, then this should be reported as a problem.
Keep the history of the mount points in the database.
Today, if the cache file is deleted or the host reimaged, then this information is lost.
Introduce a new state between FAILED and COMPLETED.
such as COMPLETED_WITH_ERRORS, that will allow tasks to look differently in the UI, so the user can clearly detect when a critical but non fatal error happened.
Plugin with Alert Framework
Attachments
Issue Links
- relates to
-
AMBARI-12252 Prevent datanode from creating an HDFS datadir when drive becomes unmounted
- Resolved