about 1000 app running on cluster, jprofiler found pullJustFinishedContainers cost too much cpu.
Ensure all completed containers are reported to the AMs across RM restart
thanks Jason Lowe Daniel Templeton Jian He and Rohith Sharma K S for the review and commit!
SUCCESS: Integrated in Hadoop-trunk-Commit #10253 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10253/)
YARN-5483. Optimize RMAppAttempt#pullJustFinishedContainers. Contributed (jlowe: rev e0b570dffb47ede298e0378a63350b699128d96e)
Thanks to sandflee for the contribution and to Daniel Templeton, Rohith Sharma K S, and Jian He for additional review! I committed this to trunk, branch-2, branch-2.8, branch-2.7, and branch-2.6.
+1 for the 2.7 and 2.6 patch as well. Committing this.
This message was automatically generated.
update the patch.
seems we should merge YARN-5262 to 2.6/2.7 too.
Good catch! I committed YARN-5262 to 2.8, 2.7, and 2.6. Could you rebase the 2.7 and 2.6 patches? Trunk patch lgtm.
without YARN-5262 , there will be too many FINISHED_CONTAINERS_PULLED_BY_AM event, seems we should merge YARN-5262 to 2.6/2.7 too.
1, update a new patch to fix some other diamond operator issues
2, in branch-2.7.patch
this leads to compile errors about "java.util.ArrayList<java.lang.Object>can't cast to java.util.List<org.apache.hadoop.yarn.api.records.ContainerStatus>)", I don;t known why, so removed the related fix.
LGTM. +1 (non-binding)
I don't think YARN-5262 breaks this jira.
Yeah, it looks like YARN-1372 is the source, and that went into 2.6. As such it'd be nice to have this go into 2.7.4 and 2.6.5 since it seems like an important performance fix.
Thanks for updating the patch! Latest one looks good to me as well. Could you provide patches for 2.7/2.6 as well? Note that 2.6 does not support JDK7, so we can't use the diamond operator there.
update the patch, rename keepContainersAcressAttempts to keepContainersAcrossAppAttempts
thanks for the catch ! A minor typo in the existing code, 'keepContainersAcressAttempts', mind fixing the same ?
Thanks sandflee for the confirmation, I will just remove the broken by link.
thanks Rohith Sharma K S, I don't think YARN-5262 breaks this jira.
Thanks sandflee for the patch. Good catch!! The overall patch make sense to me.
One small doubt, how YARN-5262 breaks this JIRA?
test failure seems not related to the patch, the test may fail with or without the patch. modified the test timeout from 5s to 50s, the test could always succ
thanks Daniel Templeton, update the patch to add diamond operator to finishedContainersSentToAM and justFinishedContainers related code.
Looks good to me. Looks like the call to pullJustFinishedContainers() wasn't clearing the list of just finished containers per the method contract. Nice catch.
With these lines:
can you please use the diamond operator in the list constructor and merge them back into one line?
Thanks for the report and patch, sandflee!
+1, patch looks good to me. I will commit this tomorrow if there are no objections.
YARN-5262 is patched. found justFinishedContainers is never cleared, and there are many unnecessarily update to justFinishedContainers.
// Clear and get current values
List<ContainerStatus> finishedContainers = justFinishedContainers.put
(nodeId, new ArrayList<ContainerStatus>());