Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3359

Yarn clients / AM should be able to provide config options to the RM / NM

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.23.0
    • Fix Version/s: None
    • Component/s: mrv2
    • Labels:
      None

      Description

      The RM and NM do not read a job's configuration. Clients / AMs should however be able to configure certain parameters for the RM/NM on a per app basis - like the Log Retention policy, token cancellation.

      1. MR-3359.txt
        14 kB
        Robert Joseph Evans
      2. MR-3359.txt
        11 kB
        Robert Joseph Evans

        Activity

        Hide
        Siddharth Seth added a comment -

        This could be a key-value list in the ContainerLaunchContext (ApplicationSubmissionContext would be better but that isn't available to the NM).
        An alternate would be to add individual properties - like setLogAggPolicy(), setCancelTokenOnAppComplete. This list would just end up growing though, so prefer the first option.

        Show
        Siddharth Seth added a comment - This could be a key-value list in the ContainerLaunchContext (ApplicationSubmissionContext would be better but that isn't available to the NM). An alternate would be to add individual properties - like setLogAggPolicy(), setCancelTokenOnAppComplete. This list would just end up growing though, so prefer the first option.
        Hide
        Robert Joseph Evans added a comment -

        This patch adds in support for mapreduce.job.complete.cancel.delegation.tokens, which was left out of MRV2. I have not added in any new tests yet to verify the code works as expected. This is mostly to get feedback on the patch.

        Show
        Robert Joseph Evans added a comment - This patch adds in support for mapreduce.job.complete.cancel.delegation.tokens, which was left out of MRV2. I have not added in any new tests yet to verify the code works as expected. This is mostly to get feedback on the patch.
        Hide
        Mahadev konar added a comment - - edited

        I think key value pairs are too generic and can get very confusing to make sense of. I dont think we'll be adding too many such options to the submission context, so I'd prefer an API to make it explicit and less confusing to use.

        Show
        Mahadev konar added a comment - - edited I think key value pairs are too generic and can get very confusing to make sense of. I dont think we'll be adding too many such options to the submission context, so I'd prefer an API to make it explicit and less confusing to use.
        Hide
        Siddharth Seth added a comment -

        I think key value pairs are too generic and can get very confusing to make sense of.

        Looking at JobConf and Job - they do already have lots of individual APIs (along with allowing config via a conf object in Job). Yarn RM/NM have explicit APIs only (App POV) - maybe it is better leaving it as is.
        The list will just end up growing though - there's 2-3 log params which eventually need to make it through to an NM, maybe other parameters later.

        Show
        Siddharth Seth added a comment - I think key value pairs are too generic and can get very confusing to make sense of. Looking at JobConf and Job - they do already have lots of individual APIs (along with allowing config via a conf object in Job). Yarn RM/NM have explicit APIs only (App POV) - maybe it is better leaving it as is. The list will just end up growing though - there's 2-3 log params which eventually need to make it through to an NM, maybe other parameters later.
        Hide
        Daryn Sharp added a comment -

        Although I helped contrib to this jira, I have concerns regarding its safety and hope it's a temporary fix.

        I feel the config is of questionable value since a misbehaving client may "forget" to cancel its tokens. The NN is holding tokens in memory so it could lead to a potential, and perhaps unintentional, denial of service attack.

        When tokens are shared between jobs, it's ambiguous as to when the tokens can be safely cancelled. How does a client know that other running or queued jobs are using the tokens? If the client intends to launch multiple jobs, but the client errors out, the tokens can't be cancelled or "very bad" things will happen to the jobs already submitted. Tasks will pound on the NN every second with the bad token, and yarn tasks appear to run "forever" if rpc connections fail. In a test env, orphaned tasks had pounded on the NN every second for a month.

        Allowing the RM to cancel tokens when the job completes, which implies tokens are good for one and only one job submission, removes the ambiguity of when it's safe to cancel the tokens. This reduces the chance of a dos attack on the NN, and from a security perspective closes the window of exposure vs. allowing tokens to linger until their lifetime expires.

        Show
        Daryn Sharp added a comment - Although I helped contrib to this jira, I have concerns regarding its safety and hope it's a temporary fix. I feel the config is of questionable value since a misbehaving client may "forget" to cancel its tokens. The NN is holding tokens in memory so it could lead to a potential, and perhaps unintentional, denial of service attack. When tokens are shared between jobs, it's ambiguous as to when the tokens can be safely cancelled. How does a client know that other running or queued jobs are using the tokens? If the client intends to launch multiple jobs, but the client errors out, the tokens can't be cancelled or "very bad" things will happen to the jobs already submitted. Tasks will pound on the NN every second with the bad token, and yarn tasks appear to run "forever" if rpc connections fail. In a test env, orphaned tasks had pounded on the NN every second for a month . Allowing the RM to cancel tokens when the job completes, which implies tokens are good for one and only one job submission, removes the ambiguity of when it's safe to cancel the tokens. This reduces the chance of a dos attack on the NN, and from a security perspective closes the window of exposure vs. allowing tokens to linger until their lifetime expires.
        Hide
        Robert Joseph Evans added a comment -

        I think that protocol buffers offers some benefits over generic key/value pairs. It includes types, even complex types that are very difficult to do with just key/value pairs. It helps to ensure binary compatibility and has default values built in. Yes if we do remove a config value that number must remain reserved for eternity and if we add in a new config it must be added to the proto file. Keeping a number reserved is not that painful and after having gone through the process of trying to document/clean up all of the yarn config options that where not published in the yarn-defaults.xml I am 100% in favor of forcing a single machine readable location for parameter values.

        Show
        Robert Joseph Evans added a comment - I think that protocol buffers offers some benefits over generic key/value pairs. It includes types, even complex types that are very difficult to do with just key/value pairs. It helps to ensure binary compatibility and has default values built in. Yes if we do remove a config value that number must remain reserved for eternity and if we add in a new config it must be added to the proto file. Keeping a number reserved is not that painful and after having gone through the process of trying to document/clean up all of the yarn config options that where not published in the yarn-defaults.xml I am 100% in favor of forcing a single machine readable location for parameter values.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12502522/MR-3359.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1249//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1249//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12502522/MR-3359.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1249//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1249//console This message is automatically generated.
        Hide
        Robert Joseph Evans added a comment -

        Canceling to upload an updated patch, with more testing.

        Show
        Robert Joseph Evans added a comment - Canceling to upload an updated patch, with more testing.
        Hide
        Mahadev konar added a comment -

        Daryn,
        Very valid concern. We should have a broader discussion on it. For now for keeping things backwards compatible, I think Bobby's approach should be fine.

        Show
        Mahadev konar added a comment - Daryn, Very valid concern. We should have a broader discussion on it. For now for keeping things backwards compatible, I think Bobby's approach should be fine.
        Hide
        Siddharth Seth added a comment -

        Took a quick look at the patch. The PB part needs a bit of change - doesn't look like the set call will ever make it back to the proto object. Ref NodeHealthStatusPBImpl.setIsNodeHealth()
        Will continue looking.

        We should just move this to 3291 which is what this patch is fixing, and close out this jira since we're not adding a k-v pair to pass configuration to the NM/RM.

        Show
        Siddharth Seth added a comment - Took a quick look at the patch. The PB part needs a bit of change - doesn't look like the set call will ever make it back to the proto object. Ref NodeHealthStatusPBImpl.setIsNodeHealth() Will continue looking. We should just move this to 3291 which is what this patch is fixing, and close out this jira since we're not adding a k-v pair to pass configuration to the NM/RM.
        Hide
        Mahadev konar added a comment -

        Bobby,
        the patch looks fine. A minor nit, we should just add javadoc on setCancelTokensWhenComplete to make sure we warn users not to use it without reason. Something like:

        WARNING: this is not recommended unless you want your single job tokens to be reused by others jobs (highly unlikely).

        Couldnt come up with something better. Feel free to innovate!

        Show
        Mahadev konar added a comment - Bobby, the patch looks fine. A minor nit, we should just add javadoc on setCancelTokensWhenComplete to make sure we warn users not to use it without reason. Something like: WARNING: this is not recommended unless you want your single job tokens to be reused by others jobs (highly unlikely). Couldnt come up with something better. Feel free to innovate!
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12502527/MR-3359.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1250//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1250//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12502527/MR-3359.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1250//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1250//console This message is automatically generated.
        Hide
        Mahadev konar added a comment -

        Also, I think the patch should be filed against MAPREDUCE-3291? no? We can leave this open for any api's for adding to logs or others? Am not too picky abt it, we can mark the other one as duplicate but might want to change the subject of the jira to reflect what its doing. The current subject is very vague and confusing.

        Show
        Mahadev konar added a comment - Also, I think the patch should be filed against MAPREDUCE-3291 ? no? We can leave this open for any api's for adding to logs or others? Am not too picky abt it, we can mark the other one as duplicate but might want to change the subject of the jira to reflect what its doing. The current subject is very vague and confusing.
        Hide
        Mahadev konar added a comment -

        Damn! didnt read Sid's comments .

        Show
        Mahadev konar added a comment - Damn! didnt read Sid's comments .
        Hide
        Robert Joseph Evans added a comment -

        I will move it over to MAPREDUCE-3291. I will also take a look at the PB code to see if it is getting set correctly or not.

        Show
        Robert Joseph Evans added a comment - I will move it over to MAPREDUCE-3291 . I will also take a look at the PB code to see if it is getting set correctly or not.
        Hide
        Robert Joseph Evans added a comment -

        resolving as per the discussion on this jira.

        Show
        Robert Joseph Evans added a comment - resolving as per the discussion on this jira.

          People

          • Assignee:
            Robert Joseph Evans
            Reporter:
            Siddharth Seth
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development