Hadoop Common
  1. Hadoop Common
  2. HADOOP-4145

[HOD] Support an accounting plugin script for HOD

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.18.1
    • Component/s: contrib/hod
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Production environments have accounting packages to track usage of a cluster. HOD should provide a mechanism to plug-in an accounting related script that can be used to verify if the user is using a valid account or not.

      1. HADOOP-4145.patch
        8 kB
        Hemanth Yamijala
      2. HADOOP-4145.patch
        8 kB
        Hemanth Yamijala
      3. HADOOP-4145.patch
        8 kB
        Hemanth Yamijala

        Activity

        Hide
        Vinod Kumar Vavilapalli added a comment -

        [..] we will live with the additional blank line for the time being, and address in a separate JIRA.

        Being done as part of HADOOP-4235.

        Show
        Vinod Kumar Vavilapalli added a comment - [..] we will live with the additional blank line for the time being, and address in a separate JIRA. Being done as part of HADOOP-4235 .
        Hide
        Nigel Daley added a comment -

        I just committed this. Thank Hemanth!

        Show
        Nigel Daley added a comment - I just committed this. Thank Hemanth!
        Hide
        Hemanth Yamijala added a comment -

        Spoke to Vinod. Given we have to move this to Hadoop 0.18.1 ASAP, we will live with the additional blank line for the time being, and address in a separate JIRA.

        Show
        Hemanth Yamijala added a comment - Spoke to Vinod. Given we have to move this to Hadoop 0.18.1 ASAP, we will live with the additional blank line for the time being, and address in a separate JIRA.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12389985/HADOOP-4145.patch
        against trunk revision 694562.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3256/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3256/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3256/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3256/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12389985/HADOOP-4145.patch against trunk revision 694562. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3256/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3256/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3256/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3256/console This message is automatically generated.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        One very minor comment. When we print the error messages in case of non-zero script exit code, we should strip new lines at the end of each message, otherwise log.critical also appends a new line character to each message and blank lines creep into log output.

        Show
        Vinod Kumar Vavilapalli added a comment - One very minor comment. When we print the error messages in case of non-zero script exit code, we should strip new lines at the end of each message, otherwise log.critical also appends a new line character to each message and blank lines creep into log output.
        Hide
        Hemanth Yamijala added a comment -

        The validation script would also require the number of nodes, queue and optionally estimated walltime

        Discussed this with Rob. We decided as this was not required for now, and will not require a change to user interface when needed in future, we'll go with the existing approach of only passing in account name. If more parameters are required, I'd actually prefer to think of passing it in some generic way to scripts - because it is a plug-in hardcoding a specific list of params might not be a good idea.

        It should fail with "Job Submission Failure"

        Modified the exit code to 4, which is the closest we have to job submission failure. Atleast both come in similar circumstances.

        Show
        Hemanth Yamijala added a comment - The validation script would also require the number of nodes, queue and optionally estimated walltime Discussed this with Rob. We decided as this was not required for now, and will not require a change to user interface when needed in future, we'll go with the existing approach of only passing in account name. If more parameters are required, I'd actually prefer to think of passing it in some generic way to scripts - because it is a plug-in hardcoding a specific list of params might not be a good idea. It should fail with "Job Submission Failure" Modified the exit code to 4, which is the closest we have to job submission failure. Atleast both come in similar circumstances.
        Hide
        Hemanth Yamijala added a comment -

        Running through Hudson again

        Show
        Hemanth Yamijala added a comment - Running through Hudson again
        Hide
        Hemanth Yamijala added a comment -

        New patch addressing two defects in the older one:

        • Incorporated Rajiv's comments on exit code.
        • I'd forgotten to include the verify-install script in the patch.
        Show
        Hemanth Yamijala added a comment - New patch addressing two defects in the older one: Incorporated Rajiv's comments on exit code. I'd forgotten to include the verify-install script in the patch.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12389921/HADOOP-4145.patch
        against trunk revision 694483.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3251/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3251/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3251/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3251/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12389921/HADOOP-4145.patch against trunk revision 694483. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3251/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3251/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3251/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3251/console This message is automatically generated.
        Hide
        Rajiv Chittajallu added a comment -

        > - Hod client looks for a specific file in the install-directory/bin folder called verify-account. If this file is found, before cluster allocation is begun, it executes the file passing in resource_manager.pbs-account as a parameter to the script.

        The validation script would also require the number of nodes, queue and optionally estimated walltime

        Queue name and the number of nodes are mandatory be mandatory.

        > - Hod exits the allocation process with a return value of 5, standing for job execution failure.

        It should fail with "Job Submission Failure"

        Show
        Rajiv Chittajallu added a comment - > - Hod client looks for a specific file in the install-directory/bin folder called verify-account . If this file is found, before cluster allocation is begun, it executes the file passing in resource_manager.pbs-account as a parameter to the script. The validation script would also require the number of nodes, queue and optionally estimated walltime Queue name and the number of nodes are mandatory be mandatory. > - Hod exits the allocation process with a return value of 5, standing for job execution failure. It should fail with "Job Submission Failure"
        Hide
        Hemanth Yamijala added a comment -

        Running through Hudson

        Show
        Hemanth Yamijala added a comment - Running through Hudson
        Hide
        Hemanth Yamijala added a comment -

        Patch addressing review comments.

        Show
        Hemanth Yamijala added a comment - Patch addressing review comments.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        In the error-code description in the user-guide, it would be good if we have a link to account verification related documentation in the admin-guide.

        Show
        Vinod Kumar Vavilapalli added a comment - In the error-code description in the user-guide, it would be good if we have a link to account verification related documentation in the admin-guide.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Very minor comments, mostly related to logging.

        • is_valid_account() documentation string should read as "<work-dir>/verify-account" or "<install-dir>/bin/verify-account" instead of "<install-dir>/verify-account". Actual usage is correct.
        • Would be clear if, while logging, the output from the verify-account script file is marked, for e.g. prepended with "verify-account output : ".
        • Irrespective of the return code
          • we should just log the output of the script at debug level.
          • we should conclude the operation by a info log statement - HOD verify-account succeeded / failed with error code <errCode>, with script output <scriptOut>
        • When verify-account returns a non-zero error code, we log "err msg from script: %s' % errMsg" - this should be "output msg from script". This msg will go away if we do what the immediately above point describes.

        Other than these, +1 for the patch, it works as described.

        Show
        Vinod Kumar Vavilapalli added a comment - Very minor comments, mostly related to logging. is_valid_account() documentation string should read as "<work-dir>/verify-account" or "<install-dir>/bin/verify-account" instead of "<install-dir>/verify-account". Actual usage is correct. Would be clear if, while logging, the output from the verify-account script file is marked, for e.g. prepended with "verify-account output : ". Irrespective of the return code we should just log the output of the script at debug level. we should conclude the operation by a info log statement - HOD verify-account succeeded / failed with error code <errCode>, with script output <scriptOut> When verify-account returns a non-zero error code, we log "err msg from script: %s' % errMsg" - this should be "output msg from script". This msg will go away if we do what the immediately above point describes. Other than these, +1 for the patch, it works as described.
        Hide
        Hemanth Yamijala added a comment -

        The attached patch implements the following changes in HOD. There are slight differences from the proposal listed above, which are also described below.

        Changes:

        • Hod client looks for a specific file in the install-directory/bin folder called verify-account. If this file is found, before cluster allocation is begun, it executes the file passing in resource_manager.pbs-account as a parameter to the script.
        • If the script's exit code is non-zero, Hod client fails allocation. Any output generated by the script (either on stdout or stderr) is printed to the Hod client.
        • Hod exits the allocation process with a return value of 5, standing for job execution failure.
        • If the verify-account script is not found, or if the script returns a zero exit code, Hod client proceeds allocation as is.
        • A dummy script file verify-account is added in the patch which sites can modify with their custom verification script. The dummy script file just exits with 0.
        • Documentation changes are done in the Hod admin guide and user guide.

        Changes from the proposal:

        • There are no configuration options introduced in this patch. This is essentially because Hod cannot control users overriding configuration set up by administrators. And the account script file should not be overridden.
        • No support for passing parameters to the account verification script. Again the reason is roughly the same as for the first point above. Also, we don't have enough use cases to see if this additional feature is essential. It should be possible to workaround by generating verification scripts that directly have these arguments.
        • All output from the script is displayed to the user, not just stderr output. This is because HOD currently has no good way of capturing only stderr output.
        Show
        Hemanth Yamijala added a comment - The attached patch implements the following changes in HOD. There are slight differences from the proposal listed above, which are also described below. Changes: Hod client looks for a specific file in the install-directory/bin folder called verify-account . If this file is found, before cluster allocation is begun, it executes the file passing in resource_manager.pbs-account as a parameter to the script. If the script's exit code is non-zero, Hod client fails allocation. Any output generated by the script (either on stdout or stderr) is printed to the Hod client. Hod exits the allocation process with a return value of 5, standing for job execution failure. If the verify-account script is not found, or if the script returns a zero exit code, Hod client proceeds allocation as is. A dummy script file verify-account is added in the patch which sites can modify with their custom verification script. The dummy script file just exits with 0. Documentation changes are done in the Hod admin guide and user guide. Changes from the proposal: There are no configuration options introduced in this patch. This is essentially because Hod cannot control users overriding configuration set up by administrators. And the account script file should not be overridden. No support for passing parameters to the account verification script. Again the reason is roughly the same as for the first point above. Also, we don't have enough use cases to see if this additional feature is essential. It should be possible to workaround by generating verification scripts that directly have these arguments. All output from the script is displayed to the user, not just stderr output. This is because HOD currently has no good way of capturing only stderr output.
        Hide
        Hemanth Yamijala added a comment -

        Proposal:

        • We can define an optional hod.account-verification-script in the hod configuration.
        • If present, then hod will launch this as:
              ${hod.account-verification-script} ${resource_manager.pbs-account} ${hod.account-verification-script-params}
          
        • The account-verification-script-params is to allow for us to pass other information in the future, for e.g. maybe a URL to a custom accounting package, etc.
        • If this script returns a zero exit code, all is fine, and the job proceeds for allocation, passing the account name as before.
        • If the script returns a non-zero exit code, allocation fails. The script can write a descriptive message to stderr and that will be printed by HOD to the user.
        • If no script option is specified, hod allocation works as it does currently.
        Show
        Hemanth Yamijala added a comment - Proposal: We can define an optional hod.account-verification-script in the hod configuration. If present, then hod will launch this as: ${hod.account-verification-script} ${resource_manager.pbs-account} ${hod.account-verification-script-params} The account-verification-script-params is to allow for us to pass other information in the future, for e.g. maybe a URL to a custom accounting package, etc. If this script returns a zero exit code, all is fine, and the job proceeds for allocation, passing the account name as before. If the script returns a non-zero exit code, allocation fails. The script can write a descriptive message to stderr and that will be printed by HOD to the user. If no script option is specified, hod allocation works as it does currently.

          People

          • Assignee:
            Hemanth Yamijala
            Reporter:
            Hemanth Yamijala
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development