Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3367

hod dealliocate does not xecute properly and throws Operation not permitted error, when hod allcoated cluster is shared with multiple users.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • 0.17.0
    • None
    • contrib/hod
    • None

    Description

      1. Allocate cluster using hadoop with dfs permissions on and this cluster is used by two users.
      2. Ran randomtextwriter and distcp jobs.
      3. When tried to deallocate hod deallocate threw "Operation not permitted" but exitted with exit code 0.
      Following the output of deallocate operation -:
      [
      [2008-05-07 15:01:47,503] DEBUG/10 hadoop:595 - hadoop-ui-log-dir not specified. Skipping Hadoop UI log collection.
      [2008-05-07 15:01:47,512] DEBUG/10 hadoop:616 - calling rm.stop
      [2008-05-07 15:01:47,559] DEBUG/10 hadoop:618 - completed rm.stop
      [2008-05-07 15:01:47,564] CRITICAL/50 hod:517 - op: deallocate cluster_dir failed: <type 'exceptions.OSError'> [Errno 1] Operation not permitted: '<path of hod.temp-dir>/<userid>.<cluster_id>'
      [2008-05-07 15:01:47,569] DEBUG/10 hod:518 - Traceback (most recent call last):
      File "/grid/0/hodqa/hod/hod-dev-20080414/hodlib/Hod/hod.py", line 510, in operation
      getattr(self, "op%s" % opList[0])(opList)
      File "/grid/0/hodqa/hod/hod-dev-20080414/hodlib/Hod/hod.py", line 365, in _op_deallocate
      self.__cluster.deallocate(clusterDir, clusterInfo)
      File "/grid/0/hodqa/hod/hod-dev-20080414/hodlib/Hod/hadoop.py", line 624, in deallocate
      shutil.rmtree(tempDir)
      File "/export/crawlspace/kryptonite/comps//python-2.5.1/lib/python2.5/shutil.py", line 178, in rmtree
      onerror(os.rmdir, path, sys.exc_info())
      File "/export/crawlspace/kryptonite/comps//python-2.5.1/lib/python2.5/shutil.py", line 176, in rmtree
      os.rmdir(path)
      OSError: [Errno 1] Operation not permitted: '<path of hod.temp-dir>/<userid>.<clusrter_id>'
      [2008-05-07 15:01:47,511] DEBUG/10 hod:522 - return code: 0
      ]

      Torque got comleted, hod list shows clsuter as dead cluster.
      It seems when mapred job is run by other user then the user who allocated the cluster. hdo.temp-dir is getting created with ownership of mapred who ran maped jobs.
      So when deallocate operation is fired, by trhe user who allcoated the cluser, hod tries to removes <hod.temp-dir>/<useruid>.<cluster_id> durectory which fails causing dellocate operation to behave oddly.

      Attachments

        Activity

          People

            Unassigned Unassigned
            karams Karam Singh
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: