Hadoop Common
  1. Hadoop Common
  2. HADOOP-3483

[HOD] Improvements with cluster directory handling

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.16.0
    • Fix Version/s: 0.18.0
    • Component/s: contrib/hod
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Modified HOD to create a cluster directory if one does not exist and to auto-deallocate a cluster while reallocating it, if it is already dead.

      Description

      The following improvements are asked for from users related to cluster directory handling:

      • Create a new cluster directory if one does not exist.
      • If a cluster directory points to a dead cluster, currently allocate fails with a message asking user to deallocate it first. Instead, it should issue a warning, deallocate the cluster and automatically allocate a fresh one.
      1. 3483.1.patch
        3 kB
        Hemanth Yamijala
      2. 3483.2.patch
        2 kB
        Hemanth Yamijala
      3. 3483.3.patch
        4 kB
        Hemanth Yamijala
      4. 3483.patch
        2 kB
        Hemanth Yamijala

        Activity

        Hemanth Yamijala created issue -
        Hide
        Hemanth Yamijala added a comment -

        Simple patch addressing both issues.

        Now, we try and create the cluster directory if it doesn't exist. Likewise, if a cluster becomes dead, and an allocate is tried using the same cluster directory, we warn the user that the cluster is being deallocated and allocate a new one using the same directory. No, data in the directory is lost.

        Show
        Hemanth Yamijala added a comment - Simple patch addressing both issues. Now, we try and create the cluster directory if it doesn't exist. Likewise, if a cluster becomes dead, and an allocate is tried using the same cluster directory, we warn the user that the cluster is being deallocated and allocate a new one using the same directory. No, data in the directory is lost.
        Hemanth Yamijala made changes -
        Field Original Value New Value
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hemanth Yamijala added a comment -

        Didn't attach file. smile

        Show
        Hemanth Yamijala added a comment - Didn't attach file. smile
        Hemanth Yamijala made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hemanth Yamijala made changes -
        Attachment 3483.patch [ 12383301 ]
        Hemanth Yamijala made changes -
        Component/s contrib/hod [ 12312090 ]
        Hide
        Hemanth Yamijala added a comment -

        Actually, the attached patch is incorrect. It will try to deallocate (incompletely) an allocated cluster which is not still dead. There should be additional checks. Will upload a new patch.

        Show
        Hemanth Yamijala added a comment - Actually, the attached patch is incorrect. It will try to deallocate (incompletely) an allocated cluster which is not still dead. There should be additional checks. Will upload a new patch.
        Hide
        Hemanth Yamijala added a comment -

        New patch that addresses the error with deallocating an active cluster. Also, updates some test cases.

        Show
        Hemanth Yamijala added a comment - New patch that addresses the error with deallocating an active cluster. Also, updates some test cases.
        Hemanth Yamijala made changes -
        Attachment 3483.1.patch [ 12383389 ]
        Hide
        Hemanth Yamijala added a comment -

        New patch that merges with trunk. Also, fixed one thing that I missed in the last patch, which is to create a new directory if one does not exist.

        Show
        Hemanth Yamijala added a comment - New patch that merges with trunk. Also, fixed one thing that I missed in the last patch, which is to create a new directory if one does not exist.
        Hemanth Yamijala made changes -
        Attachment 3483.2.patch [ 12383434 ]
        Hemanth Yamijala made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12383434/3483.2.patch
        against trunk revision 663447.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2583/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2583/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2583/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2583/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12383434/3483.2.patch against trunk revision 663447. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2583/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2583/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2583/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2583/console This message is automatically generated.
        Hide
        Hemanth Yamijala added a comment -

        The failure in the core and contrib tests are respectively in DFS and Streaming. As these are not dependent on HOD, the failures are unrelated to the patch.

        Show
        Hemanth Yamijala added a comment - The failure in the core and contrib tests are respectively in DFS and Streaming. As these are not dependent on HOD, the failures are unrelated to the patch.
        Hide
        Karam Singh added a comment -

        Checked that if we provide non-existent directory to hod allocate command e.g. hod allocate -d testdir -n 5. Hod creates cluster directory.
        Also checked that if provide cluster dir on non-writable path than it properly throws Permission denied "error.

        If hod allcoate is provided with cluster directory whose cluster status is dead, then hod client thows warning -:
        [
        WARNING/30 hod:278 - Found a previously allocated cluster at cluster directory '<clsuter dir path>'. Deallocating this cluster to allocate a new one.
        ]
        and reallocates the cluster using that directory. Results are also same for mapred dead cluster we normally face.
        But if cluster status becomes "mapred dead" - due to job tracker is gone or cluster status is "hdfs dead" - due to namenode is gone but the actual torque job is running. In that case hod throws warning and also reallocates the cluster with specified cluster directory but it does not deallocates previous cluster so old cluster also remains running and new cluster job also starts. Also some times in this type of case "hod list" displays mixed status of both clusters means -: cluster id of new cluster but cluster state of old cluster.

        Show
        Karam Singh added a comment - Checked that if we provide non-existent directory to hod allocate command e.g. hod allocate -d testdir -n 5. Hod creates cluster directory. Also checked that if provide cluster dir on non-writable path than it properly throws Permission denied "error. If hod allcoate is provided with cluster directory whose cluster status is dead, then hod client thows warning -: [ WARNING/30 hod:278 - Found a previously allocated cluster at cluster directory '<clsuter dir path>'. Deallocating this cluster to allocate a new one. ] and reallocates the cluster using that directory. Results are also same for mapred dead cluster we normally face. But if cluster status becomes "mapred dead" - due to job tracker is gone or cluster status is "hdfs dead" - due to namenode is gone but the actual torque job is running. In that case hod throws warning and also reallocates the cluster with specified cluster directory but it does not deallocates previous cluster so old cluster also remains running and new cluster job also starts. Also some times in this type of case "hod list" displays mixed status of both clusters means -: cluster id of new cluster but cluster state of old cluster.
        Hemanth Yamijala made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hemanth Yamijala made changes -
        Attachment 3483.3.patch [ 12383481 ]
        Hide
        Hemanth Yamijala added a comment -

        Thanks for the checks Karam.

        I've addressed the issue you mentioned. The change in logic is to now look for the state of the torque job, rather than state of the daemons to decide whether we can auto-deallocate or not. If the state of the job is comleted or exiting, we deallocate, else we just print an error as before.

        Show
        Hemanth Yamijala added a comment - Thanks for the checks Karam. I've addressed the issue you mentioned. The change in logic is to now look for the state of the torque job, rather than state of the daemons to decide whether we can auto-deallocate or not. If the state of the job is comleted or exiting, we deallocate, else we just print an error as before.
        Hemanth Yamijala made changes -
        Release Note Modified HOD to create a cluster directory if one does not exist and to auto-deallocate a cluster while reallocating it, if it is already dead.
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hadoop Flags [Incompatible change, Reviewed]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12383481/3483.3.patch
        against trunk revision 663487.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2594/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2594/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2594/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2594/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12383481/3483.3.patch against trunk revision 663487. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2594/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2594/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2594/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2594/console This message is automatically generated.
        Hide
        Mukund Madhugiri added a comment -

        The hudson failures in core and contrib are not from this patch

        Show
        Mukund Madhugiri added a comment - The hudson failures in core and contrib are not from this patch
        Hide
        Mukund Madhugiri added a comment -

        I just committed this for Hemanth. Thanks Hemanth!

        Show
        Mukund Madhugiri added a comment - I just committed this for Hemanth. Thanks Hemanth!
        Mukund Madhugiri made changes -
        Resolution Fixed [ 1 ]
        Hadoop Flags [Reviewed, Incompatible change] [Incompatible change, Reviewed]
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Nigel Daley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Patch Available Patch Available Open Open
        13h 13m 2 Hemanth Yamijala 05/Jun/08 19:09
        Open Open Patch Available Patch Available
        1d 17h 52m 3 Hemanth Yamijala 05/Jun/08 19:13
        Patch Available Patch Available Resolved Resolved
        3h 59m 1 Mukund Madhugiri 05/Jun/08 23:12
        Resolved Resolved Closed Closed
        77d 21h 38m 1 Nigel Daley 22/Aug/08 20:50

          People

          • Assignee:
            Hemanth Yamijala
            Reporter:
            Hemanth Yamijala
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development