Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.3.0
-
None
-
Reviewed
Attachments
Attachments
- YARN-1339v6.patch
- 29 kB
- Jason Darrell Lowe
- YARN-1339v5.patch
- 29 kB
- Jason Darrell Lowe
- YARN-1339v4.patch
- 40 kB
- Jason Darrell Lowe
- YARN-1339v3-and-YARN-1987.patch
- 46 kB
- Jason Darrell Lowe
- YARN-1339v2.patch
- 39 kB
- Jason Darrell Lowe
- YARN-1339.patch
- 42 kB
- Jason Darrell Lowe
Issue Links
- requires
-
YARN-1987 Wrapper for leveldb DBIterator to aid in handling database exceptions
- Closed
Activity
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12633771/YARN-1339.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3312//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3312//console
This message is automatically generated.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12639285/YARN-1339v2.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3536//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3536//console
This message is automatically generated.
Updating the patch to address the DBException handling that was brought up in the MAPREDUCE-5652 review and applies here. Note that this now depends upon YARN-1987 as that provides the utility wrapper for the leveldb iterator to translate raw RuntimeException to the more helpful DBException so we can act accordingly when errors occur. The other notable change in the patch is renaming LevelDB to Leveldb for consistency with the existing LeveldbTimelineStore naming convention.
This latest patch includes the necessary pieces of YARN-1987 so it can compile and Jenkins can comment.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12642859/YARN-1339v3-and-YARN-1987.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3673//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3673//console
This message is automatically generated.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12645284/YARN-1339v4.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3759//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3759//console
This message is automatically generated.
Hi jlowe, thanks for your patch. I am reviewing it now. Would you mind to sync v4 patch against latest trunk? Thanks!
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12649398/YARN-1339v5.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3944//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3944//console
This message is automatically generated.
Patch looks good overall. Some minor comments:
In DeletionService.java,
+ recordDeletionTaskInStateStore(task);
Shall we add "if (stateStore.canRecover())" so that we only do record work when stateStore in levelDB?
+ public synchronized FileDeletionTask[] getSuccessorTasks() { + FileDeletionTask[] successors = + new FileDeletionTask[successorTaskSet.size()]; + return successorTaskSet.toArray(successors); + }
Does it necessary for turning set into an array here? I only see iteratoring elements later. If so, it could be simpler to leave it as set.
+ private void recover(RecoveredDeletionServiceState state)
...
May be we should quickly return from recover method if state is empty for cases that NM get first-time start or no scheduled fileDeletionTask during NM restart.
+ basePaths.add(new Path(new URI(basedir)));
I think this is the same with basePaths.add(new Path(basedir)). Also I think some path.toUri().toString() can be replaced with path.toString() directly.
In NMNullStateStoreService.java,
+ public RecoveredDeletionServiceState loadDeletionServiceState() + throws IOException { + throw new UnsupportedOperationException( + "Recovery not supported by this state store"); + } + + @Override + public void storeDeletionTask(int taskId, + DeletionServiceDeleteTaskProto taskProto) throws IOException { + } + + @Override + public void removeDeletionTask(int taskId) throws IOException { + }
May be we should throw exceptions in all three methods to keep consistent?
Thanks for the review, Junping!
Shall we add "if (stateStore.canRecover())" so that we only do record work when stateStore in levelDB?
Fixed. I added an early-out for this in the recordDeletionTaskInStateStore method.
Does it necessary for turning set into an array here? I only see iteratoring elements later. If so, it could be simpler to leave it as set.
Theoretically yes. We need to return a copy of the successors because the set is protected by synchronization on the FileDeletionTask. If we return it directly then we could corrupt it via concurrency. In practice this probably doesn't occur, but for maintenance sake I played it on the safe side. I don't think this is going to be a performance problem because in deletion tasks typically don't have very many successors. I suppose we could use a ConcurrentMap instead of a HashSet and expose direct access to it, but that seemed like overkill – there might be a lot of deletion tasks lingering around and concurrent maps are more memory-intensive. Or we could use a SynchronizedSet and make sure the iterating code locks it appropriately. Let me know if something needs to change here.
May be we should quickly return from recover method if state is empty for cases that NM get first-time start or no scheduled fileDeletionTask during NM restart.
The method is already going to be pretty quick if there aren't any deletion tasks recovered. In that case all it will do is allocate a hash map and hash set, iterate over the empty list of tasks, iterate over the empty hash map and return. Given the recover method is only called on startup once and isn't that expensive in the nothing-to-do case, an early out seems like an unnecessary optimization – or am I missing something?
I think this is the same with basePaths.add(new Path(basedir)). Also I think some path.toUri().toString() can be replaced with path.toString() directly.
Fixed. I vaguely recall having some issues using the raw paths on an early prototype of the code, but it seems to be working fine without explicit URI usage.
May be we should throw exceptions in all three methods to keep consistent?
The point of the null state store is to silently eat attempts to store. It only throws for recovery methods to note to developers that there is no way to recover from this state store. If we put exceptions in all the store methods then we'll have to sprinkle canRecover() checks throughout the code which I'd rather avoid if possible. NullRMStateStore behaves the same way as do the other store methods in the null state store, so in that sense it's consistent.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12649673/YARN-1339v6.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3951//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3951//console
This message is automatically generated.
Thanks for addressing my comments, jlowe!
+1. The v6 patch LGTM, will commit it shortly.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12649673/YARN-1339v6.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4005//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4005//console
This message is automatically generated.
I have commit this to trunk and branch-2. Thanks jlowe for contributing a patch!
SUCCESS: Integrated in Hadoop-trunk-Commit #5715 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5715/)
YARN-1339. Recover DeletionService state upon nodemanager restart. (Contributed by Jason Lowe) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1603036)
- /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DeletionService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDeletionService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
SUCCESS: Integrated in Hadoop-Yarn-trunk #586 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/586/)
YARN-1339. Recover DeletionService state upon nodemanager restart. (Contributed by Jason Lowe) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1603036)
- /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DeletionService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDeletionService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
SUCCESS: Integrated in Hadoop-Hdfs-trunk #1777 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1777/)
YARN-1339. Recover DeletionService state upon nodemanager restart. (Contributed by Jason Lowe) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1603036)
- /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DeletionService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDeletionService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
FAILURE: Integrated in Hadoop-Mapreduce-trunk #1804 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1804/)
YARN-1339. Recover DeletionService state upon nodemanager restart. (Contributed by Jason Lowe) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1603036)
- /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DeletionService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDeletionService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
- /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
Patch to allow recovery of deletion service state after an NM restart. Like the other NM restart patches, the state store is abstracted and a leveldb service is used when recovery is enabled. When it is disabled a null state store is used.