Hadoop YARN
  1. Hadoop YARN
  2. YARN-147

Add support for CPU isolation/monitoring of containers

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 2.0.3-alpha
    • Fix Version/s: 2.0.3-alpha
    • Component/s: nodemanager
    • Labels:
      None

      Description

      This is a clone for YARN-3 to be able to submit the patch as YARN-3 does not show the SUBMIT PATCH button.

      1. YARN-147-v9.patch
        44 kB
        Andrew Ferguson
      2. YARN-147-v8.patch
        44 kB
        Andrew Ferguson
      3. YARN-147-v6.patch
        43 kB
        Andrew Ferguson
      4. YARN-147-v5.patch
        43 kB
        Andrew Ferguson
      5. YARN-147-v4.patch
        43 kB
        Andrew Ferguson
      6. YARN-147-v3.patch
        43 kB
        Andrew Ferguson
      7. YARN-147-v2.patch
        41 kB
        Andrew Ferguson
      8. YARN-147-v1.patch
        40 kB
        Andrew Ferguson
      9. YARN-3.patch
        40 kB
        Alejandro Abdelnur

        Issue Links

          Activity

          Hide
          Alejandro Abdelnur added a comment -

          Closing this as dup of YARN-3.

          Show
          Alejandro Abdelnur added a comment - Closing this as dup of YARN-3 .
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12561571/YARN-147-v9.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/235//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/235//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12561571/YARN-147-v9.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/235//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/235//console This message is automatically generated.
          Hide
          Andrew Ferguson added a comment -

          hi,

          this version fixes the broken test case in the previous patch, and should hopefully fix the findbugs warning.

          thanks!
          Andrew

          Show
          Andrew Ferguson added a comment - hi, this version fixes the broken test case in the previous patch, and should hopefully fix the findbugs warning. thanks! Andrew
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12555695/YARN-147-v8.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

          org.apache.hadoop.yarn.server.nodemanager.TestLinuxContainerExecutorWithMocks

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/230//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/230//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/230//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12555695/YARN-147-v8.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestLinuxContainerExecutorWithMocks +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/230//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/230//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/230//console This message is automatically generated.
          Hide
          Andrew Ferguson added a comment -

          updated as per reviews on comments here and on YARN-3.

          Show
          Andrew Ferguson added a comment - updated as per reviews on comments here and on YARN-3 .
          Hide
          Andrew Ferguson added a comment -

          hi Arun C Murthy, I've started posting replies on YARN-3 instead. the LCE bug is fixed and I'll post a new patch after addressing Vinod Kumar Vavilapalli's comments. thanks!

          Show
          Andrew Ferguson added a comment - hi Arun C Murthy , I've started posting replies on YARN-3 instead. the LCE bug is fixed and I'll post a new patch after addressing Vinod Kumar Vavilapalli 's comments. thanks!
          Hide
          Arun C Murthy added a comment -

          Cancelling patch while comments are addressed, particularly the one Sid raised - we can't break LCE.

          Also, we need to make sure this continues to work on RHEL5/CentOS5 which doesn't have cgroups.

          One more thing - can we please do reviews/discussions on YARN-3 to ensure we keep track in one place? Thanks.

          Show
          Arun C Murthy added a comment - Cancelling patch while comments are addressed, particularly the one Sid raised - we can't break LCE. Also, we need to make sure this continues to work on RHEL5/CentOS5 which doesn't have cgroups. One more thing - can we please do reviews/discussions on YARN-3 to ensure we keep track in one place? Thanks.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Got lost between YARN-3 and YARN-147

          Very nice to have patch, I am willing to get it in ASAP given the amount of time it's been around.

          Some comments below.

          • yarn.nodemanager.linux-container-executor.cgroups.mount has different defaults in code and in yarn-default.xml
          • If the configs can be done away with (see below), ignore this comment. The descriptions for all the new configs in yarn-default.xml heavily reference code. We should simplify them to not address code and instead make them understandable by users and cross reference other related parameters.
          •     // Based on testing, ApplicationMaster executables don't terminate until
                // a little after the container appears to have finished. Therefore, we
                // wait a short bit for the cgroup to become empty before deleting it.
               

            Can you explain this? Is this sleep necessary. Depending on its importance, we'll need to fix the following Id check, AMs don't always have ID equaling one.

          • container-executor.c: If a mount-point is already mounted, mount gives a EBUSY error, mount_cgroup() will need to be fixed to support remounts (for e.g. on NM restarts). We could unmount cgroup fs on shutdown but that isn't always guaranteed.
          • Please update if you have tested it on a secure setup with LCE enabled with and without cgroups.

          The following are already raised by others in some way, but I don't see them fixed in the latest patch. Unless I am missing something:

          • Not sure of the benefit of configurable yarn.nodemanager.linux-container-executor.cgroups.mount-path. Couldn't NM just always mount to a path that it creates and owns? Similar comment for the hierarchy-prefix.
          • CgroupsLCEResourcesHandler is swallowing exceptions and errors in multiple places - updateCgroup() and createCgroup(). In the later, if cgroups are enabled, and we can't create the file, it is a critical error?

          One overarching improvement worth pursing immediately, either now or in follow up tickets:

          • Make ResourcesHandler top level. I'd like to merge the ContainersMonitor functionality with this so as to monitor/enforce memory limits also. ContainersMinotor is top-level, we should make ResourcesHandler also top-level so that other platforms don't need to create this type-hierarchy all over again when they wish to implement some or all of this functionality.
          Show
          Vinod Kumar Vavilapalli added a comment - Got lost between YARN-3 and YARN-147 Very nice to have patch, I am willing to get it in ASAP given the amount of time it's been around. Some comments below. yarn.nodemanager.linux-container-executor.cgroups.mount has different defaults in code and in yarn-default.xml If the configs can be done away with (see below), ignore this comment. The descriptions for all the new configs in yarn-default.xml heavily reference code. We should simplify them to not address code and instead make them understandable by users and cross reference other related parameters. // Based on testing, ApplicationMaster executables don't terminate until // a little after the container appears to have finished. Therefore, we // wait a short bit for the cgroup to become empty before deleting it. Can you explain this? Is this sleep necessary. Depending on its importance, we'll need to fix the following Id check, AMs don't always have ID equaling one. container-executor.c: If a mount-point is already mounted, mount gives a EBUSY error, mount_cgroup() will need to be fixed to support remounts (for e.g. on NM restarts). We could unmount cgroup fs on shutdown but that isn't always guaranteed. Please update if you have tested it on a secure setup with LCE enabled with and without cgroups. The following are already raised by others in some way, but I don't see them fixed in the latest patch. Unless I am missing something: Not sure of the benefit of configurable yarn.nodemanager.linux-container-executor.cgroups.mount-path. Couldn't NM just always mount to a path that it creates and owns? Similar comment for the hierarchy-prefix. CgroupsLCEResourcesHandler is swallowing exceptions and errors in multiple places - updateCgroup() and createCgroup(). In the later, if cgroups are enabled, and we can't create the file, it is a critical error? One overarching improvement worth pursing immediately, either now or in follow up tickets: Make ResourcesHandler top level. I'd like to merge the ContainersMonitor functionality with this so as to monitor/enforce memory limits also. ContainersMinotor is top-level, we should make ResourcesHandler also top-level so that other platforms don't need to create this type-hierarchy all over again when they wish to implement some or all of this functionality.
          Hide
          Siddharth Seth added a comment -

          Andrew, could you please comment on testing with and without cgroups.

          I don't know enough about cgroups to review the functionality. Otherwise, if this has been tested, I'm also +1 on getting this in (after resolving the previous comment), based on the previous reviews. Thanks!

          Minor,

          • LCEResourcesHandler, some of the function names (preExecute / postExecute) are a little tough to understand, without looking at where they're used. Would be nice to have JavaDocs / and maybe more verbose function names.
            This is nice to have, and can be a separate jira.
          Show
          Siddharth Seth added a comment - Andrew, could you please comment on testing with and without cgroups. I don't know enough about cgroups to review the functionality. Otherwise, if this has been tested, I'm also +1 on getting this in (after resolving the previous comment), based on the previous reviews. Thanks! Minor, LCEResourcesHandler, some of the function names (preExecute / postExecute) are a little tough to understand, without looking at where they're used. Would be nice to have JavaDocs / and maybe more verbose function names. This is nice to have, and can be a separate jira.
          Hide
          Siddharth Seth added a comment -

          From a quick look, I'm not sure if the LCE will continue to work without cgroups. ? LAUNCH_CONTAINER seems to want a "key=" as the resources parameter, and the DefaultLCEResourceHandler sends in a "none".

          Show
          Siddharth Seth added a comment - From a quick look, I'm not sure if the LCE will continue to work without cgroups. ? LAUNCH_CONTAINER seems to want a "key=" as the resources parameter, and the DefaultLCEResourceHandler sends in a "none".
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12550094/YARN-147-v5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/105//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/105//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550094/YARN-147-v5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/105//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/105//console This message is automatically generated.
          Hide
          Alejandro Abdelnur added a comment -

          +1 pending jenkins (kicked it)

          Show
          Alejandro Abdelnur added a comment - +1 pending jenkins (kicked it)
          Hide
          Colin Patrick McCabe added a comment -

          Looks good to me.

          Show
          Colin Patrick McCabe added a comment - Looks good to me.
          Hide
          Andrew Ferguson added a comment -

          updated patch with most recent suggestions from Colin.

          Show
          Andrew Ferguson added a comment - updated patch with most recent suggestions from Colin.
          Hide
          Andrew Ferguson added a comment -

          thanks again, Colin. I'm going to stick with strchr instead of strspn because I think it's clearer that we get NULL when there is no = in the input string, rather than having to check if the length of the "key" is the same as the length of the string in that case.

          thanks for noticing that the error handling wasn't quite right in mount_cgroup.

          updated patch to follow in a sec.

          Show
          Andrew Ferguson added a comment - thanks again, Colin. I'm going to stick with strchr instead of strspn because I think it's clearer that we get NULL when there is no = in the input string, rather than having to check if the length of the "key" is the same as the length of the string in that case. thanks for noticing that the error handling wasn't quite right in mount_cgroup . updated patch to follow in a sec.
          Hide
          Colin Patrick McCabe added a comment -

          get_kv_key, get_kv_value: thanks for implementing the API I suggested. It looks a lot better. You might consider using strspn to simplify the code a bit (it returns a number of bytes, rather than a pointer, like index does.) However, this is up to you (it's fine as-is). It's more traditional to put the doxygen into the header file than the .c file. Header files tend to define an interface, so that's where I go to find out how that interface should be used.

          mount_cgroup: it looks like if result = -1, the call to fprintf will print out errno, which has not been set. You could either set errno instead of result in the preceding code, or restructure the code a bit.

          Show
          Colin Patrick McCabe added a comment - get_kv_key , get_kv_value : thanks for implementing the API I suggested. It looks a lot better. You might consider using strspn to simplify the code a bit (it returns a number of bytes, rather than a pointer, like index does.) However, this is up to you (it's fine as-is). It's more traditional to put the doxygen into the header file than the .c file. Header files tend to define an interface, so that's where I go to find out how that interface should be used. mount_cgroup : it looks like if result = -1 , the call to fprintf will print out errno , which has not been set. You could either set errno instead of result in the preceding code, or restructure the code a bit.
          Hide
          Andrew Ferguson added a comment -

          small fix in two places: don't log & re-throw the same exception – construct new exceptions with better context, and use the previous one as the cause.

          thanks Tucu for pointing this out!

          Show
          Andrew Ferguson added a comment - small fix in two places: don't log & re-throw the same exception – construct new exceptions with better context, and use the previous one as the cause. thanks Tucu for pointing this out!
          Hide
          Andrew Ferguson added a comment -

          hi Colin,

          thanks for looking at the native code. since the changes were pretty extensive, would you mind taking a careful look again? if it's easier for you, the incremental changes can be seen here:
          https://github.com/adferguson/hadoop-common/commits/adf-yarn-147

          I hope I've faithfully implemented the new key-value API you suggested – let me know if that's not the case.

          If the mount fails, I let the exception bubble all the way up to stop the NodeManager, as Tucu suggested before about a different error.

          The one thing I did not do is change the open / write / close methods to fopen / fprintf / fclose, as the rest of the native code does not use those methods. Which would you prefer to see: adjust my patch to use fopen, etc., or fix my use of open, etc.?

          Yes, I totally agree that it would be better if main.c used getopt_long; it definitely smells like another JIRA to me.

          thanks!
          Andrew

          Show
          Andrew Ferguson added a comment - hi Colin, thanks for looking at the native code. since the changes were pretty extensive, would you mind taking a careful look again? if it's easier for you, the incremental changes can be seen here: https://github.com/adferguson/hadoop-common/commits/adf-yarn-147 I hope I've faithfully implemented the new key-value API you suggested – let me know if that's not the case. If the mount fails, I let the exception bubble all the way up to stop the NodeManager, as Tucu suggested before about a different error. The one thing I did not do is change the open / write / close methods to fopen / fprintf / fclose, as the rest of the native code does not use those methods. Which would you prefer to see: adjust my patch to use fopen, etc., or fix my use of open, etc.? Yes, I totally agree that it would be better if main.c used getopt_long; it definitely smells like another JIRA to me. thanks! Andrew
          Hide
          Andrew Ferguson added a comment -

          update native code per review by Colin

          Show
          Andrew Ferguson added a comment - update native code per review by Colin
          Hide
          Colin Patrick McCabe added a comment -

          Thanks for working on this, Andrew! I took a quick look at the native part of this patch.

          get_kv_key, get_kv_value: I would recommend that your API look something like this:

          /**
           * If str is a string of the form key=val, find 'key'
           * 
           * @param input    The input string
           * @param out      Where to put the output string.
           * @param out_len  The length of the output buffer.
           *
           * @return         -ENAMETOOLONG if out_len is not long enough;
           *                 -EINVAL if there is no equals sign in the input;
           *                 0 on success
           */
          int get_key(const char *input, char *out, size_t out_len);
          

          This API doesn't require you to modify the input or allocate memory-- two things that make the code very hard to follow. It gives the caller the choice of how much memory to allocate and how to allocate it (stack, malloc, etc.)

            char pid_buf[21];
            snprintf(pid_buf, 21, "%d", pid);
          

          snprintf(pid_buf, sizeof(pid_buf), "%d", pid); would be considered better style.

          More importantly, it would probably be better for you to use fopen / fprintf / fclose here. If you do want to use the raw open / write / close methods, you have to correctly handle short writes and EINTR, which you are not doing here.

          mount_cgroup: you really need to check the return code of mount and do something if it fails.

          I would also recommend that you allocate hier_path on the stack with

          char hier_path[PATH_MAX]
          

          or similar. Using malloc here is not really worth it.

          stpncpy(buf, hierarchy, strlen(hierarchy));
          

          This seems to do the same thing here as the more widely available and portable strncpy function. (You're not using the pointer which stpncpy returns.) Better yet, use snprintf(buf, sizeof(buf), "%s", hierarchy); which will do bounds checking properly.

          main.c: Just a general comment. I really recommend using getopt_long to parse options. I've seen manual option parsing grow into a huge mess over time as everyone adds "just one more option." You don't have to do that in this change (although it would be easy enough to do), but please consider it in the future.

          Show
          Colin Patrick McCabe added a comment - Thanks for working on this, Andrew! I took a quick look at the native part of this patch. get_kv_key , get_kv_value : I would recommend that your API look something like this: /** * If str is a string of the form key=val, find 'key' * * @param input The input string * @param out Where to put the output string. * @param out_len The length of the output buffer. * * @ return -ENAMETOOLONG if out_len is not long enough; * -EINVAL if there is no equals sign in the input; * 0 on success */ int get_key( const char *input, char *out, size_t out_len); This API doesn't require you to modify the input or allocate memory-- two things that make the code very hard to follow. It gives the caller the choice of how much memory to allocate and how to allocate it (stack, malloc, etc.) char pid_buf[21]; snprintf(pid_buf, 21, "%d" , pid); snprintf(pid_buf, sizeof(pid_buf), "%d", pid); would be considered better style. More importantly, it would probably be better for you to use fopen / fprintf / fclose here. If you do want to use the raw open / write / close methods, you have to correctly handle short writes and EINTR , which you are not doing here. mount_cgroup : you really need to check the return code of mount and do something if it fails. I would also recommend that you allocate hier_path on the stack with char hier_path[PATH_MAX] or similar. Using malloc here is not really worth it. stpncpy(buf, hierarchy, strlen(hierarchy)); This seems to do the same thing here as the more widely available and portable strncpy function. (You're not using the pointer which stpncpy returns.) Better yet, use snprintf(buf, sizeof(buf), "%s", hierarchy); which will do bounds checking properly. main.c : Just a general comment. I really recommend using getopt_long to parse options. I've seen manual option parsing grow into a huge mess over time as everyone adds "just one more option." You don't have to do that in this change (although it would be easy enough to do), but please consider it in the future.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12549422/YARN-147-v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/93//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/93//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12549422/YARN-147-v2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/93//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/93//console This message is automatically generated.
          Hide
          Andrew Ferguson added a comment -

          thanks for the additional comments, Tucu! I've updated the patch as per your review. hopefully I have done everything correctly.

          btw, I have this patch in github:
          https://github.com/adferguson/hadoop-common/tree/adf-yarn-147
          you can see the changes for this patch in the most recent commit.

          thanks!
          Andrew

          Show
          Andrew Ferguson added a comment - thanks for the additional comments, Tucu! I've updated the patch as per your review. hopefully I have done everything correctly. btw, I have this patch in github: https://github.com/adferguson/hadoop-common/tree/adf-yarn-147 you can see the changes for this patch in the most recent commit. thanks! Andrew
          Hide
          Alejandro Abdelnur added a comment -

          Andrew, thanks for updating the patch. Mostly minor stuff:

          • CgroupsLCEResourcesHandler has an unused import for FileNotFoundException
          • CgroupsLCEResourcesHandler, on the double '//', on the init() method, trim the '/' is present at the beginning or end of the cgroupPrefix
          • CgroupsLCEResourcesHandler, parseMtab, no need have a local var for the fReader, the FileReader creation could be done directly in the BufferedReader constructor.
          • CgroupsLCEResourcesHandler, parseMtab, the last exception should print MTAB_FILE instead the BufferReader instance.
          • CgroupsLCEResourcesHandler, on parseMtab error I think we should thrown an exception to halt things. My reasoning is that if I've configured things to use cgroups, I'd expect them to be working as opposed to have a warning message lost in the logs. This would help identify misconfigurations.
          Show
          Alejandro Abdelnur added a comment - Andrew, thanks for updating the patch. Mostly minor stuff: CgroupsLCEResourcesHandler has an unused import for FileNotFoundException CgroupsLCEResourcesHandler, on the double '//', on the init() method, trim the '/' is present at the beginning or end of the cgroupPrefix CgroupsLCEResourcesHandler, parseMtab, no need have a local var for the fReader, the FileReader creation could be done directly in the BufferedReader constructor. CgroupsLCEResourcesHandler, parseMtab, the last exception should print MTAB_FILE instead the BufferReader instance. CgroupsLCEResourcesHandler, on parseMtab error I think we should thrown an exception to halt things. My reasoning is that if I've configured things to use cgroups, I'd expect them to be working as opposed to have a warning message lost in the logs. This would help identify misconfigurations.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12549094/YARN-147-v1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/89//testReport/
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/89//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12549094/YARN-147-v1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/89//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/89//console This message is automatically generated.
          Hide
          Andrew Ferguson added a comment -

          hi Tucu,

          thanks very much for opening this new jira and reviewing the patch. I've updated a new version which addresses most of your comments.

          answers to the questions in your review:
          .bq cgroupMountPath, if there is no default we should fail if not set, can't we have a sensible default?

          I've added a check to fail if not set. as far as I can tell, there isn't a single default path for cgroups – some distributions use "/sys/fs/cgroup", some use "/cgroup", others, "/cgroups". I've even seen "/mnt/cgroup" (Debian perhaps?); these also vary across releases of the same distro.

          .bq default value for cgroupPrefix has '/', here will produce a '//' in the path

          yes, I made that choice deliberately. I wanted to convey that cgroupPrefix can be a path (which is why I kept the '/') and when I use it, I also added a '/' in case the user did not put a '/' at the right place in the prefix. my understanding is that on Unix, '//' in a path is interpreted as '/', no?

          .bq Nf the filereader cannot be open/read, is this acceptable or should stop execution by throwing exception?

          eh, we could go either way here, but I think it's reasonable to not throw the exception. if the file can't be read, then the map from cgroup controller to path isn't built, and we already have existing checks which skip controllers which can't be found in the path (say, if the file can be read correctly, but the CPU controller isn't mounted).

          ok, great. I'm going to mark this as "patch available" and see if the findbugs warning has gone away (I can't seem to get it to run locally).

          thanks!!
          Andrew

          Show
          Andrew Ferguson added a comment - hi Tucu, thanks very much for opening this new jira and reviewing the patch. I've updated a new version which addresses most of your comments. answers to the questions in your review: .bq cgroupMountPath, if there is no default we should fail if not set, can't we have a sensible default? I've added a check to fail if not set. as far as I can tell, there isn't a single default path for cgroups – some distributions use "/sys/fs/cgroup", some use "/cgroup", others, "/cgroups". I've even seen "/mnt/cgroup" (Debian perhaps?); these also vary across releases of the same distro. .bq default value for cgroupPrefix has '/', here will produce a '//' in the path yes, I made that choice deliberately. I wanted to convey that cgroupPrefix can be a path (which is why I kept the '/') and when I use it, I also added a '/' in case the user did not put a '/' at the right place in the prefix. my understanding is that on Unix, '//' in a path is interpreted as '/', no? .bq Nf the filereader cannot be open/read, is this acceptable or should stop execution by throwing exception? eh, we could go either way here, but I think it's reasonable to not throw the exception. if the file can't be read, then the map from cgroup controller to path isn't built, and we already have existing checks which skip controllers which can't be found in the path (say, if the file can be read correctly, but the CPU controller isn't mounted). ok, great. I'm going to mark this as "patch available" and see if the findbugs warning has gone away (I can't seem to get it to run locally). thanks!! Andrew
          Hide
          Andrew Ferguson added a comment -

          updated patch as per Tucu's review

          Show
          Andrew Ferguson added a comment - updated patch as per Tucu's review
          Hide
          Alejandro Abdelnur added a comment - - edited

          CgroupsLCEResourcesHandler

          a few lines exceed 80 chars, reformat

          CgroupsLCEResourcesHandler.setConf()

          move these assignments to init()
          cgroupMount, default should be true, so the mounts are created if they don't exist
          cgroupMountPath, if there is no default we should fail if not set, can't we have a sensible default?

          CgroupsLCEResourcesHandler.pathForCgroup()

          default value for cgroupPrefix has '/', here will produce a '//' in the path

          CgroupsLCEResourcesHandler.updateCgroup()

          LOG.debug should be within an IF LOG.DebugEnabled BLOCK to avoid concatenation of the message if not logging.

          The writer close should be done in a finally after checking f != null. this is generating the findbugs warning

          CgroupsLCEResourcesHandler.parseMtab()

          Why not move the filereader constructor to the try block below

          Nf the filereader cannot be open/read, is this acceptable or should stop execution by throwing exception?

          No need to close fREader, closing in takes care of.

          container-executor.c mount_cgroup()

          Use an ELSE block to avoid return in the middle of the function.

          Show
          Alejandro Abdelnur added a comment - - edited CgroupsLCEResourcesHandler a few lines exceed 80 chars, reformat CgroupsLCEResourcesHandler.setConf() move these assignments to init() cgroupMount, default should be true, so the mounts are created if they don't exist cgroupMountPath, if there is no default we should fail if not set, can't we have a sensible default? CgroupsLCEResourcesHandler.pathForCgroup() default value for cgroupPrefix has '/', here will produce a '//' in the path CgroupsLCEResourcesHandler.updateCgroup() LOG.debug should be within an IF LOG.DebugEnabled BLOCK to avoid concatenation of the message if not logging. The writer close should be done in a finally after checking f != null. this is generating the findbugs warning CgroupsLCEResourcesHandler.parseMtab() Why not move the filereader constructor to the try block below Nf the filereader cannot be open/read, is this acceptable or should stop execution by throwing exception? No need to close fREader, closing in takes care of. container-executor.c mount_cgroup() Use an ELSE block to avoid return in the middle of the function.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12548069/YARN-3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-YARN-Build/73//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/73//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
          Console output: https://builds.apache.org/job/PreCommit-YARN-Build/73//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548069/YARN-3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/73//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/73//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/73//console This message is automatically generated.
          Hide
          Alejandro Abdelnur added a comment -

          duplicating it for test-patch purposes.

          Show
          Alejandro Abdelnur added a comment - duplicating it for test-patch purposes.
          Hide
          Alejandro Abdelnur added a comment -

          Attaching last patch from YARN-3 from Andrew, had to tweak a few imports in LCE java that were not applying.

          Show
          Alejandro Abdelnur added a comment - Attaching last patch from YARN-3 from Andrew, had to tweak a few imports in LCE java that were not applying.

            People

            • Assignee:
              Andrew Ferguson
              Reporter:
              Alejandro Abdelnur
            • Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development