Hadoop Common
  1. Hadoop Common
  2. HADOOP-3483

[HOD] Improvements with cluster directory handling

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.16.0
    • Fix Version/s: 0.18.0
    • Component/s: contrib/hod
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Modified HOD to create a cluster directory if one does not exist and to auto-deallocate a cluster while reallocating it, if it is already dead.

      Description

      The following improvements are asked for from users related to cluster directory handling:

      • Create a new cluster directory if one does not exist.
      • If a cluster directory points to a dead cluster, currently allocate fails with a message asking user to deallocate it first. Instead, it should issue a warning, deallocate the cluster and automatically allocate a fresh one.
      1. 3483.patch
        2 kB
        Hemanth Yamijala
      2. 3483.1.patch
        3 kB
        Hemanth Yamijala
      3. 3483.2.patch
        2 kB
        Hemanth Yamijala
      4. 3483.3.patch
        4 kB
        Hemanth Yamijala

        Activity

        Hide
        Mukund Madhugiri added a comment -

        I just committed this for Hemanth. Thanks Hemanth!

        Show
        Mukund Madhugiri added a comment - I just committed this for Hemanth. Thanks Hemanth!
        Hide
        Mukund Madhugiri added a comment -

        The hudson failures in core and contrib are not from this patch

        Show
        Mukund Madhugiri added a comment - The hudson failures in core and contrib are not from this patch
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12383481/3483.3.patch
        against trunk revision 663487.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2594/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2594/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2594/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2594/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12383481/3483.3.patch against trunk revision 663487. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2594/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2594/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2594/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2594/console This message is automatically generated.
        Hide
        Hemanth Yamijala added a comment -

        Thanks for the checks Karam.

        I've addressed the issue you mentioned. The change in logic is to now look for the state of the torque job, rather than state of the daemons to decide whether we can auto-deallocate or not. If the state of the job is comleted or exiting, we deallocate, else we just print an error as before.

        Show
        Hemanth Yamijala added a comment - Thanks for the checks Karam. I've addressed the issue you mentioned. The change in logic is to now look for the state of the torque job, rather than state of the daemons to decide whether we can auto-deallocate or not. If the state of the job is comleted or exiting, we deallocate, else we just print an error as before.
        Hide
        Karam Singh added a comment -

        Checked that if we provide non-existent directory to hod allocate command e.g. hod allocate -d testdir -n 5. Hod creates cluster directory.
        Also checked that if provide cluster dir on non-writable path than it properly throws Permission denied "error.

        If hod allcoate is provided with cluster directory whose cluster status is dead, then hod client thows warning -:
        [
        WARNING/30 hod:278 - Found a previously allocated cluster at cluster directory '<clsuter dir path>'. Deallocating this cluster to allocate a new one.
        ]
        and reallocates the cluster using that directory. Results are also same for mapred dead cluster we normally face.
        But if cluster status becomes "mapred dead" - due to job tracker is gone or cluster status is "hdfs dead" - due to namenode is gone but the actual torque job is running. In that case hod throws warning and also reallocates the cluster with specified cluster directory but it does not deallocates previous cluster so old cluster also remains running and new cluster job also starts. Also some times in this type of case "hod list" displays mixed status of both clusters means -: cluster id of new cluster but cluster state of old cluster.

        Show
        Karam Singh added a comment - Checked that if we provide non-existent directory to hod allocate command e.g. hod allocate -d testdir -n 5. Hod creates cluster directory. Also checked that if provide cluster dir on non-writable path than it properly throws Permission denied "error. If hod allcoate is provided with cluster directory whose cluster status is dead, then hod client thows warning -: [ WARNING/30 hod:278 - Found a previously allocated cluster at cluster directory '<clsuter dir path>'. Deallocating this cluster to allocate a new one. ] and reallocates the cluster using that directory. Results are also same for mapred dead cluster we normally face. But if cluster status becomes "mapred dead" - due to job tracker is gone or cluster status is "hdfs dead" - due to namenode is gone but the actual torque job is running. In that case hod throws warning and also reallocates the cluster with specified cluster directory but it does not deallocates previous cluster so old cluster also remains running and new cluster job also starts. Also some times in this type of case "hod list" displays mixed status of both clusters means -: cluster id of new cluster but cluster state of old cluster.
        Hide
        Hemanth Yamijala added a comment -

        The failure in the core and contrib tests are respectively in DFS and Streaming. As these are not dependent on HOD, the failures are unrelated to the patch.

        Show
        Hemanth Yamijala added a comment - The failure in the core and contrib tests are respectively in DFS and Streaming. As these are not dependent on HOD, the failures are unrelated to the patch.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12383434/3483.2.patch
        against trunk revision 663447.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2583/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2583/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2583/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2583/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12383434/3483.2.patch against trunk revision 663447. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2583/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2583/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2583/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2583/console This message is automatically generated.
        Hide
        Hemanth Yamijala added a comment -

        New patch that merges with trunk. Also, fixed one thing that I missed in the last patch, which is to create a new directory if one does not exist.

        Show
        Hemanth Yamijala added a comment - New patch that merges with trunk. Also, fixed one thing that I missed in the last patch, which is to create a new directory if one does not exist.
        Hide
        Hemanth Yamijala added a comment -

        New patch that addresses the error with deallocating an active cluster. Also, updates some test cases.

        Show
        Hemanth Yamijala added a comment - New patch that addresses the error with deallocating an active cluster. Also, updates some test cases.
        Hide
        Hemanth Yamijala added a comment -

        Actually, the attached patch is incorrect. It will try to deallocate (incompletely) an allocated cluster which is not still dead. There should be additional checks. Will upload a new patch.

        Show
        Hemanth Yamijala added a comment - Actually, the attached patch is incorrect. It will try to deallocate (incompletely) an allocated cluster which is not still dead. There should be additional checks. Will upload a new patch.
        Hide
        Hemanth Yamijala added a comment -

        Didn't attach file. smile

        Show
        Hemanth Yamijala added a comment - Didn't attach file. smile
        Hide
        Hemanth Yamijala added a comment -

        Simple patch addressing both issues.

        Now, we try and create the cluster directory if it doesn't exist. Likewise, if a cluster becomes dead, and an allocate is tried using the same cluster directory, we warn the user that the cluster is being deallocated and allocate a new one using the same directory. No, data in the directory is lost.

        Show
        Hemanth Yamijala added a comment - Simple patch addressing both issues. Now, we try and create the cluster directory if it doesn't exist. Likewise, if a cluster becomes dead, and an allocate is tried using the same cluster directory, we warn the user that the cluster is being deallocated and allocate a new one using the same directory. No, data in the directory is lost.

          People

          • Assignee:
            Hemanth Yamijala
            Reporter:
            Hemanth Yamijala
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development