Hadoop Common
  1. Hadoop Common
  2. HADOOP-2899

[HOD] hdfs:///mapredsystem directory not cleaned up after deallocation

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.16.0
    • Fix Version/s: 0.17.0
    • Component/s: contrib/hod
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      The mapred system directory generated by HOD is cleaned up at cluster deallocation time.

      Description

      Each submitted job creates a hdfs:///mapredsystem directory, created by (I guess) the hodring process. Problem is that it's not cleaned up at the end of the process; a use case would be:

      • user A allocates a cluster, the hodring is svrX, so a /mapredsystem/srvX directory is created
      • user A deallocates the cluster, but that directory is not cleaned up
      • user B allocates a cluster, and the first node chosen as hodring is svrX, so hodring tries to write hdfs:///mapredsystem but it fails
      • allocation succeeds, but there's no hodring running; looking at
        0-jobtracker/logdir/hadoop.log under the temporary directory I can read:

      2008-02-26 17:28:42,567 WARN org.apache.hadoop.mapred.JobTracker: Error starting tracker: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=B, access=WRITE, inode="mapredsystem":hadoop:supergroup:rwxr-xr-x

      I guess a possible solution would be to clean up those directories during the deallocation process.

      1. 2899.1.patch
        16 kB
        Hemanth Yamijala

        Activity

        Hide
        Hemanth Yamijala added a comment - - edited

        Hodring does not create the dfs directory, though it supplies the name to the jobtracker. The name is /mapredsystem/hostname. JobTracker creates this at startup time, but doesn't delete it when killed. However, when another jobtracker starts on the same hostname, it seems to be trying to clean it up in some way, and the current permissions set by JobTracker code prevent a read from happening, which is why there's a failure.

        Possible solutions:

        1. The directory is world-readable. In HADOOP-1873, Doug commented that this should not be world-readable, so I guess this is not an option, for some reason.
        2. JobTracker can delete the directory when it is shutdown. This seems, IMO, the best solution. Whoever creates, deletes.
        3. HOD can create a directory that's specific to the user, something like mapredsystem/user-name.clusterid, where clusterid could be like a torque jobid. This may create a huge number of directories in HDFS, don't know if that's an issue. Otherwise, this is probably the easiest solution to implement
        4. HOD can delete the mapredsystem directory at deallocation. This seems wrong, because it doesn't create it, and further, given our current design, this is very hard to implement.
        5. JobTracker can avoid cleaning up the directory at startup. This may not be safe, in case there's a crash or something.

        Comments ?

        Show
        Hemanth Yamijala added a comment - - edited Hodring does not create the dfs directory, though it supplies the name to the jobtracker. The name is /mapredsystem/hostname. JobTracker creates this at startup time, but doesn't delete it when killed. However, when another jobtracker starts on the same hostname, it seems to be trying to clean it up in some way, and the current permissions set by JobTracker code prevent a read from happening, which is why there's a failure. Possible solutions: 1. The directory is world-readable. In HADOOP-1873 , Doug commented that this should not be world-readable, so I guess this is not an option, for some reason. 2. JobTracker can delete the directory when it is shutdown. This seems, IMO, the best solution. Whoever creates, deletes. 3. HOD can create a directory that's specific to the user, something like mapredsystem/user-name.clusterid, where clusterid could be like a torque jobid. This may create a huge number of directories in HDFS, don't know if that's an issue. Otherwise, this is probably the easiest solution to implement 4. HOD can delete the mapredsystem directory at deallocation. This seems wrong, because it doesn't create it, and further, given our current design, this is very hard to implement. 5. JobTracker can avoid cleaning up the directory at startup. This may not be safe, in case there's a crash or something. Comments ?
        Hide
        Luca Telloli added a comment -

        My opinion is that solution 2 is probably the best after all.
        Afterall I'm not sure exactly why this directory is created, so I might be wrong here, but if it's like a "lock" or a placeholder to the hodring node I see solution 3 as potentially dangerous, in the sense that having that info in the user space would not prevent the system from reusing the same resource by a different process run by a different user.

        Show
        Luca Telloli added a comment - My opinion is that solution 2 is probably the best after all. Afterall I'm not sure exactly why this directory is created, so I might be wrong here, but if it's like a "lock" or a placeholder to the hodring node I see solution 3 as potentially dangerous, in the sense that having that info in the user space would not prevent the system from reusing the same resource by a different process run by a different user.
        Hide
        Tsz Wo Nicholas Sze added a comment -
        • Hemanth, is jobConf.getSystemDir() the dfs directory (/mapredsystem/hostname) you mentioned above? If it is the case, I believe we should implement solution 2.
        • In addition, we should also implement solution 3, the path should be unique across HOD. If we use /mapredsystem/hostname, I guess two users in the same host cannot allocate clusters at the same time. Something like /mapredsystem/user-name.clusterid seems good. It might be even better to append a random number at the end.
        Show
        Tsz Wo Nicholas Sze added a comment - Hemanth, is jobConf.getSystemDir() the dfs directory (/mapredsystem/hostname) you mentioned above? If it is the case, I believe we should implement solution 2. In addition, we should also implement solution 3, the path should be unique across HOD. If we use /mapredsystem/hostname, I guess two users in the same host cannot allocate clusters at the same time. Something like /mapredsystem/user-name.clusterid seems good. It might be even better to append a random number at the end.
        Hide
        Hemanth Yamijala added a comment -

        Luca,

        Afterall I'm not sure exactly why this directory is created, so I might be wrong here, but if it's like a "lock" or a placeholder to the hodring node I see solution 3 as potentially dangerous, in the sense that having that info in the user space would not prevent the system from reusing the same resource by a different process run by a different user.

        This is not a lock directory. This is the directory where the jobtracker stores the job files for jobs it runs - like the job.jar, job.xml etc. It creates a sub directory for each job it runs.

        Nicholas,

        Hemanth, is jobConf.getSystemDir() the dfs directory (/mapredsystem/hostname) you mentioned above? If it is the case, I believe we should implement solution 2.

        Yes. It is the same.

        In addition, we should also implement solution 3, the path should be unique across HOD. If we use /mapredsystem/hostname, I guess two users in the same host cannot allocate clusters at the same time. Something like /mapredsystem/user-name.clusterid seems good. It might be even better to append a random number at the end.

        Makes sense, I have no problems doing this. If done in addition to 2, this might actually remove the doubt I asked, about number of DFS directories created. In current configuration of torque, the situation of a clash will not arise, because we don't allow nodes to be allocated for more than one cluster. But its definitely better not to assume this.

        Just in case 2 is not done for some reason, this problem would be addressed, except for my doubt. Can someone shed some light on that ? Is it preferable to not have too many directories created in dfs ? (Note these will be empty once the cluster is deallocated - so no files).

        Show
        Hemanth Yamijala added a comment - Luca, Afterall I'm not sure exactly why this directory is created, so I might be wrong here, but if it's like a "lock" or a placeholder to the hodring node I see solution 3 as potentially dangerous, in the sense that having that info in the user space would not prevent the system from reusing the same resource by a different process run by a different user. This is not a lock directory. This is the directory where the jobtracker stores the job files for jobs it runs - like the job.jar, job.xml etc. It creates a sub directory for each job it runs. Nicholas, Hemanth, is jobConf.getSystemDir() the dfs directory (/mapredsystem/hostname) you mentioned above? If it is the case, I believe we should implement solution 2. Yes. It is the same. In addition, we should also implement solution 3, the path should be unique across HOD. If we use /mapredsystem/hostname, I guess two users in the same host cannot allocate clusters at the same time. Something like /mapredsystem/user-name.clusterid seems good. It might be even better to append a random number at the end. Makes sense, I have no problems doing this. If done in addition to 2, this might actually remove the doubt I asked, about number of DFS directories created. In current configuration of torque, the situation of a clash will not arise, because we don't allow nodes to be allocated for more than one cluster. But its definitely better not to assume this. Just in case 2 is not done for some reason, this problem would be addressed, except for my doubt. Can someone shed some light on that ? Is it preferable to not have too many directories created in dfs ? (Note these will be empty once the cluster is deallocated - so no files).
        Hide
        Hemanth Yamijala added a comment -

        I think there is consensus that Hadoop JobTracker should delete the mapred system directory, and that HOD should name directories to avoid clashes. I am changing the component name for this JIRA, and am going to open a new one for the HOD bug. I guess both should be addressed for 0.16.1.

        Show
        Hemanth Yamijala added a comment - I think there is consensus that Hadoop JobTracker should delete the mapred system directory, and that HOD should name directories to avoid clashes. I am changing the component name for this JIRA, and am going to open a new one for the HOD bug. I guess both should be addressed for 0.16.1.
        Hide
        Hemanth Yamijala added a comment -

        Filed HADOOP-2925 to track HOD changes.

        Show
        Hemanth Yamijala added a comment - Filed HADOOP-2925 to track HOD changes.
        Hide
        Devaraj Das added a comment -

        This was marked for 0.16.1 but the fix for this is blocked by HADOOP-2812. Marking the fix version as unknown for now.

        Show
        Devaraj Das added a comment - This was marked for 0.16.1 but the fix for this is blocked by HADOOP-2812 . Marking the fix version as unknown for now.
        Hide
        Sameer Paranjpye added a comment -

        Did you mean HADOOP-2815?

        Show
        Sameer Paranjpye added a comment - Did you mean HADOOP-2815 ?
        Hide
        Devaraj Das added a comment -

        Yes I meant HADOOP-2815. Thanks for pointing that out!

        Show
        Devaraj Das added a comment - Yes I meant HADOOP-2815 . Thanks for pointing that out!
        Hide
        Sameer Paranjpye added a comment -

        > 2008-02-26 17:28:42,567 WARN org.apache.hadoop.mapred.JobTracker: Error starting tracker: org.apache.hadoop.ipc.RemoteException:
        > org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=B, access=WRITE, inode="mapredsystem":hadoop:supergroup:rwxr-xr-x

        Doesn't this mean that the Jobtracker failed to write to the /mapredsystem directory? And that it failed because it did not have permission to write into /mapredsystem? I'm not sure that it has anything to do with the jobtracker not cleaning up after itself. That may also cause a problem, but this error message appears to say something different.

        Show
        Sameer Paranjpye added a comment - > 2008-02-26 17:28:42,567 WARN org.apache.hadoop.mapred.JobTracker: Error starting tracker: org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=B, access=WRITE, inode="mapredsystem":hadoop:supergroup:rwxr-xr-x Doesn't this mean that the Jobtracker failed to write to the /mapredsystem directory? And that it failed because it did not have permission to write into /mapredsystem? I'm not sure that it has anything to do with the jobtracker not cleaning up after itself. That may also cause a problem, but this error message appears to say something different.
        Hide
        Hemanth Yamijala added a comment -

        We had a discussion with Sameer, and Devaraj on this issue. Logically, the mapred system directory is something that the admin would set up in a static job tracker case. In that case, he would be responsible for managing this. In the HOD world, HOD is playing this role and generating the mapred system directory. Hence, it feels like it should set up and clean it. However, in the current design of HOD, this is not very easy to do in a short time. Therefore, I am moving this to be fixed in Hadoop 0.17 in HOD. In the meantime, we can write a simple utility to clean up these directories periodically. The impact of not doing it immediately is reduced with such a utility / script.

        Show
        Hemanth Yamijala added a comment - We had a discussion with Sameer, and Devaraj on this issue. Logically, the mapred system directory is something that the admin would set up in a static job tracker case. In that case, he would be responsible for managing this. In the HOD world, HOD is playing this role and generating the mapred system directory. Hence, it feels like it should set up and clean it. However, in the current design of HOD, this is not very easy to do in a short time. Therefore, I am moving this to be fixed in Hadoop 0.17 in HOD. In the meantime, we can write a simple utility to clean up these directories periodically. The impact of not doing it immediately is reduced with such a utility / script.
        Hide
        Hemanth Yamijala added a comment -

        Attached patch adds code to the hodcleanup process to delete the mapred system directory.

        • The hodring process passes arguments required for the deletion (process id of the job tracker, namenode url, path to the hadoop command, and the name of the mapred system directory) to the hodcleanup process.
        • The hodcleaup process first waits until the jobtracker is down. It waits by checking for the job tracker process to end.
        • Then it issues a hadoop dfs -fs namenode-url -rmr mapredsystemdir

        Another change made in this patch is the way the namenode url is specified while copying logs to the dfs as part of the hod cleanup process.

        Added a few tests. However, they are not really complete. We need to refactor hod code a bit to enable more unit testing - like providing the ability to create a mock simplecommand etc. These can be handled later.

        Show
        Hemanth Yamijala added a comment - Attached patch adds code to the hodcleanup process to delete the mapred system directory. The hodring process passes arguments required for the deletion (process id of the job tracker, namenode url, path to the hadoop command, and the name of the mapred system directory) to the hodcleanup process. The hodcleaup process first waits until the jobtracker is down. It waits by checking for the job tracker process to end. Then it issues a hadoop dfs -fs namenode-url -rmr mapredsystemdir Another change made in this patch is the way the namenode url is specified while copying logs to the dfs as part of the hod cleanup process. Added a few tests. However, they are not really complete. We need to refactor hod code a bit to enable more unit testing - like providing the ability to create a mock simplecommand etc. These can be handled later.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12378256/2899.1.patch
        against trunk revision 619744.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 8 new or modified tests.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2006/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2006/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2006/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2006/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12378256/2899.1.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 8 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2006/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2006/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2006/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2006/console This message is automatically generated.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Very minor changes :

        • In testing/testHodCleanup.py, test class testUnresponsiveJobTracker has the log message "Job Tracker did not exit even after a minute. Not going to try and cleanup the system directory". The time 'minute' should instead depend on the number of retries.
        • In the same class, the mrSysDir parameter used is "/user/yhemanth/mapredsystem/hoduser.123.abc.com". Need to change this.

        Even that, these are pretty cosmetic, so they can be checked in later, given the want of time.

        +1 for the fix. OK for commit.

        Show
        Vinod Kumar Vavilapalli added a comment - Very minor changes : In testing/testHodCleanup.py, test class testUnresponsiveJobTracker has the log message "Job Tracker did not exit even after a minute. Not going to try and cleanup the system directory". The time 'minute' should instead depend on the number of retries. In the same class, the mrSysDir parameter used is "/user/yhemanth/mapredsystem/hoduser.123.abc.com". Need to change this. Even that, these are pretty cosmetic, so they can be checked in later, given the want of time. +1 for the fix. OK for commit.
        Hide
        Devaraj Das added a comment -

        I just committed this. Thanks, Hemanth!

        Show
        Devaraj Das added a comment - I just committed this. Thanks, Hemanth!
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #434 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/434/ )
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #435 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/435/ )

          People

          • Assignee:
            Hemanth Yamijala
            Reporter:
            Luca Telloli
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development