Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
0.17.0
-
None
-
None
Description
1. Allocate cluster using hadoop with dfs permissions on and this cluster is used by two users.
2. Ran randomtextwriter and distcp jobs.
3. When tried to deallocate hod deallocate threw "Operation not permitted" but exitted with exit code 0.
Following the output of deallocate operation -:
[
[2008-05-07 15:01:47,503] DEBUG/10 hadoop:595 - hadoop-ui-log-dir not specified. Skipping Hadoop UI log collection.
[2008-05-07 15:01:47,512] DEBUG/10 hadoop:616 - calling rm.stop
[2008-05-07 15:01:47,559] DEBUG/10 hadoop:618 - completed rm.stop
[2008-05-07 15:01:47,564] CRITICAL/50 hod:517 - op: deallocate cluster_dir failed: <type 'exceptions.OSError'> [Errno 1] Operation not permitted: '<path of hod.temp-dir>/<userid>.<cluster_id>'
[2008-05-07 15:01:47,569] DEBUG/10 hod:518 - Traceback (most recent call last):
File "/grid/0/hodqa/hod/hod-dev-20080414/hodlib/Hod/hod.py", line 510, in operation
getattr(self, "op%s" % opList[0])(opList)
File "/grid/0/hodqa/hod/hod-dev-20080414/hodlib/Hod/hod.py", line 365, in _op_deallocate
self.__cluster.deallocate(clusterDir, clusterInfo)
File "/grid/0/hodqa/hod/hod-dev-20080414/hodlib/Hod/hadoop.py", line 624, in deallocate
shutil.rmtree(tempDir)
File "/export/crawlspace/kryptonite/comps//python-2.5.1/lib/python2.5/shutil.py", line 178, in rmtree
onerror(os.rmdir, path, sys.exc_info())
File "/export/crawlspace/kryptonite/comps//python-2.5.1/lib/python2.5/shutil.py", line 176, in rmtree
os.rmdir(path)
OSError: [Errno 1] Operation not permitted: '<path of hod.temp-dir>/<userid>.<clusrter_id>'
[2008-05-07 15:01:47,511] DEBUG/10 hod:522 - return code: 0
]
Torque got comleted, hod list shows clsuter as dead cluster.
It seems when mapred job is run by other user then the user who allocated the cluster. hdo.temp-dir is getting created with ownership of mapred who ran maped jobs.
So when deallocate operation is fired, by trhe user who allcoated the cluser, hod tries to removes <hod.temp-dir>/<useruid>.<cluster_id> durectory which fails causing dellocate operation to behave oddly.