[AMBARI-12252] Prevent datanode from creating an HDFS datadir when drive becomes unmounted - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.7.0
Fix Version/s: 2.1.0
Component/s: ambari-agent
Labels:
None

Description

Ambari keeps track of a file, /etc/hadoop/conf/dfs_data_dir_mount.hist
that contains a mapping of HDFS data dirs to the last known mount point.

This is used to detect when a data dir becomes unmounted, in order to prevent HDFS from writing to the root partition.

Consider the example of a data node configured with these volumes:

/dev/sda -> /
/dev/sdb -> /grid/0
/dev/sdc -> /grid/1
/dev/sdd -> /grid/2

Typically, each /grid/#/ directory contains a data folder.
Today, if a data directory becomes unmounted, then the directory will not exist and Ambari will not create it automatically. Ambari will simply log a warning, and update its cache with the new mount point, which is / ; that is the underlying bug.

If hdfs-site contains dfs.datanode.failed.volumes.tolerated with a value > 0, then DataNode will tolerate the failure, otherwise, the DataNode will die.

Because Ambari will already have "/" in its cache file, the fact that it used to be mounted in a non-root drive is lost, so next time DataNode is restarted, Ambari will create the data dir which is now mounted on the root partition; this is really bad because HDFS will now fill up the root drive.

The admin can still remount the partition, but then needs to restart DataNode so Ambari can update its cache.

The ideal way to fix this in Ambari 2.2 is as follows,

Track which data dirs the admin wants mounted on a non-root partition. If the admin wishes all data dirs to be on non-root mounts, but the initial install is incorrect, then this should be reported as a problem.
Keep the history of the mount points in the database. Today, if the cache file is deleted or the host reimaged, then this information is lost.
Introduce a new state between FAILED and COMPLETED, such as COMPLETED_WITH_ERRORS, that will allow tasks to look differently in the UI, so the user can clearly detect when a critical but non fatal error happened.
Plugin with Alert Framework

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

AMBARI-12252.branch-2.1.patch
02/Jul/15 04:51
20 kB
Alejandro Fernandez
AMBARI-12252.patch
02/Jul/15 04:51
20 kB
Alejandro Fernandez

Issue Links

is related to

AMBARI-12267 Ambari to improve tracking of data dirs becoming unmounted

Open

relates to

AMBARI-7506 Ambari DataNode shouldn't create dfs.data.dir paths after installation when path becomes unmounted

Resolved

links to

Code Review patch

Activity

People

Assignee:: Alejandro Fernandez

Reporter:: Alejandro Fernandez

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 02/Jul/15 04:48

Updated:: 04/Jul/15 11:31

Resolved:: 02/Jul/15 20:33