Description
TestEventGeneration's testForNoDuplicates fails time to time depending on the circumstances of the test.
Attachments
Attachments
- OOZIE-2429-1.patch
- 10 kB
- Ferenc Denes
- OOZIE-2429-2.patch
- 12 kB
- Ferenc Denes
- OOZIE-2429-3.patch
- 11 kB
- Ferenc Denes
- OOZIE-2429-3.patch
- 11 kB
- Ferenc Denes
- OOZIE-2429-addendum-1.patch
- 3 kB
- Ferenc Denes
- OOZIE-2429-addendum-2.patch
- 3 kB
- Ferenc Denes
- OOZIE-2429-addendum-3.patch
- 3 kB
- Robert Kanter
Activity
With the patch I have:
1. Fixed the event generation core, such a way that if the state of the coordinator has not changed there is no event generated.
2. switched off the EventHandlerService in the test, as it has drained some events from the queue while observing the state of that.
3. Separated the testForNoDulicates test to 3, so it is more visible what failed if any.
4. Fixed the check of the order of events in the queue.
Testing JIRA OOZIE-2429
Cleaning local git workspace
----------------------------
+1 PATCH_APPLIES
+1 CLEAN
+1 RAW_PATCH_ANALYSIS
. +1 the patch does not introduce any @author tags
. +1 the patch does not introduce any tabs
. +1 the patch does not introduce any trailing spaces
. +1 the patch does not introduce any line longer than 132
. +1 the patch does adds/modifies 1 testcase(s)
+1 RAT
. +1 the patch does not seem to introduce new RAT warnings
+1 JAVADOC
. +1 the patch does not seem to introduce new Javadoc warnings
+1 COMPILE
. +1 HEAD compiles
. +1 patch compiles
. +1 the patch does not seem to introduce new javac warnings
+1 BACKWARDS_COMPATIBILITY
. +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations
. +1 the patch does not modify JPA files
-1 TESTS
. Tests run: 1705
. Tests failed: 3
. Tests errors: 0
. The patch failed the following testcases:
. testPurgeXCommandFailed(org.apache.oozie.command.TestPurgeXCommand)
. testPurgeWFWithSubWF1(org.apache.oozie.command.TestPurgeXCommand)
. testSamplers(org.apache.oozie.util.TestMetricsInstrumentation)
+1 DISTRO
. +1 distro tarball builds with the patch
----------------------------
-1 Overall result, please check the reported -1(s)
The full output of the test-patch run is available at
. https://builds.apache.org/job/oozie-trunk-precommit-build/2697/
A few things:
- I think we're wrapping too much in the if statement added to CoordActionUpdateXCommand. We're interested in the call to generateEvent here, right? But the code is also doing a few other things like updating the last modified time, etc. I think it's safer to add the check for formerCoordinatorStatus and formerCoordinatorPending just around the generateEvent call. For instance:
if (EventHandlerService.isEnabled() && (formerCoordinatorStatus != coordAction.getStatus() || formerCoordinatorPending != coordAction.getPending())) { generateEvent(coordAction, coordJob.getUser(), coordJob.getAppName(), workflow.getStartTime()); } else { Log... }
- In the else block, I think we should (a) make it a debug level message and (b) make the message itself a little more clear (the grammar is a little funny). How about LOG.debug("No event generated because Coordinator Action [{0}]'s status [{1}] and pending [{2}] have not changed", coordAction.getId(), coordAction.getStatus(), coordAction.getPending());
- A number of the log messages in CoordActionUpdateXCommand are using String concatenation instead of the "{#}" formatting. Can you fix these while we're here? For an example, see what I did above. The "[" and "]" are not necessary for it to work, but we typically do that to indicate that the value was filled in.
- Instead of commenting out //final WorkflowJobGetJPAExecutor readCmd2 = new WorkflowJobGetJPAExecutor(jobId1); in the test, you can just delete it.
Thanks for the review.
1. In the if statement (as in the patch) we update the modified time, add other commands above generating events. I think we should do all those only in case of any real modification. That's why I have kept all those in there. I think it is right this way, and also safer to handle all those together. Kept it in there, if you still think it is necessary to ammend I will do it.
2. I have modified it to debug, however it was info as the other similar log messages were info as well.
3. Fixed all those.
4. Removed that.
Testing JIRA OOZIE-2429
Cleaning local git workspace
----------------------------
+1 PATCH_APPLIES
+1 CLEAN
+1 RAW_PATCH_ANALYSIS
. +1 the patch does not introduce any @author tags
. +1 the patch does not introduce any tabs
. +1 the patch does not introduce any trailing spaces
. +1 the patch does not introduce any line longer than 132
. +1 the patch does adds/modifies 1 testcase(s)
+1 RAT
. +1 the patch does not seem to introduce new RAT warnings
+1 JAVADOC
. +1 the patch does not seem to introduce new Javadoc warnings
+1 COMPILE
. +1 HEAD compiles
. +1 patch compiles
. +1 the patch does not seem to introduce new javac warnings
+1 BACKWARDS_COMPATIBILITY
. +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations
. +1 the patch does not modify JPA files
-1 TESTS
. Tests run: 1706
. Tests failed: 6
. Tests errors: 0
. The patch failed the following testcases:
. testBundleStatusCoordSubmitFails(org.apache.oozie.service.TestStatusTransitService)
. testActionCheckTransientDuringLauncher(org.apache.oozie.command.wf.TestActionCheckXCommand)
. testCoordMaterializeTriggerService3(org.apache.oozie.service.TestCoordMaterializeTriggerService)
. testSamplers(org.apache.oozie.util.TestMetricsInstrumentation)
. testPurgeXCommandFailed(org.apache.oozie.command.TestPurgeXCommand)
. testPurgeWFWithSubWF1(org.apache.oozie.command.TestPurgeXCommand)
+1 DISTRO
. +1 distro tarball builds with the patch
----------------------------
-1 Overall result, please check the reported -1(s)
The full output of the test-patch run is available at
. https://builds.apache.org/job/oozie-trunk-precommit-build/2713/
} if (formerCoordinatorStatus != coordAction.getStatus() || formerCoordinatorPending != coordAction.getPending()) { LOG.info("Updating Coordinator action id: [{0}] status [{1}] to [{2}] pending [{3}] to [{4}]", coordAction.getId(), formerCoordinatorStatus, coordAction.getStatus(), formerCoordinatorPending, coordAction.getPending());
This should go to verifyPrecondition, we don't have to do any processing if there are no change in status or pending fag.
Purshotam Shah Thank for the review. I have spent some more time walking through the precondition verification code, and this as well. I think this solution is more correct fot he following reasons:
The veryfyPrecondition throws an exception in case of a precondition issue. However this check above is to skip some actions and become a no operation with no additional external effect.
Throwing an exception in this case would alter the behaviour of the verification, and probably the intent of that as well.
Also this status change is up to the logic above this condition check. The code/condition mentioned here is for the administration of the changed action (like updating the modification time, SLAs, DB etc).
Can you please see the full execute() function to see if this check is in the right place.
Thanks a lot
Purshotam Shah, the verifyPrecondition takes care of skipping when the workflow and coord action are both running and pending is false. If we add the new if statement to verifyPrecondition, then I think we'll end up skipping updating the coord status and sla status when the coord action is running but the workflow changes to succeeded, failed, etc. Right?
Actually, I think we probably could put something in verifyPrecondition, but it would need to have more logic in it. If you look at execute with the patch, it does a no-op if the if statements at the beginning which check the workflow status don't change the coord action's status. I think we could probably put some of the logic in the verifyPrecondition.
I think the patch try to avoid any operation if there are no changes in coord action. This can be easily done by adding extra and condition in existing verifyPrecondition.
protected void verifyPrecondition() throws CommandException, PreconditionException {
// if coord action is RUNNING and pending false and workflow is RUNNING, this doesn't need to be updated.
// if coord action status == workflow, then coord action is already updated
if ( workflow.getStatus().equals(coordAction.getStatus()) && !coordAction.isPending()) {
try
catch (JPAExecutorException je)
{ throw new CommandException(je); } throw new PreconditionException(ErrorCode.E1100, ", workflow is RUNNING and coordinator action is RUNNING and pending false");
}
}
Purshotam Shah, Robert Kanter thanks for the comments. I have attached the new patch with the required logic.
Testing JIRA OOZIE-2429
Cleaning local git workspace
----------------------------
+1 PATCH_APPLIES
+1 CLEAN
+1 RAW_PATCH_ANALYSIS
. +1 the patch does not introduce any @author tags
. +1 the patch does not introduce any tabs
. +1 the patch does not introduce any trailing spaces
. +1 the patch does not introduce any line longer than 132
. +1 the patch does adds/modifies 1 testcase(s)
+1 RAT
. +1 the patch does not seem to introduce new RAT warnings
+1 JAVADOC
. +1 the patch does not seem to introduce new Javadoc warnings
+1 COMPILE
. +1 HEAD compiles
. +1 patch compiles
. +1 the patch does not seem to introduce new javac warnings
+1 BACKWARDS_COMPATIBILITY
. +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations
. +1 the patch does not modify JPA files
-1 TESTS
. Tests run: 1764
. Tests failed: 3
. Tests errors: 0
. The patch failed the following testcases:
. testBundleStatusCoordSubmitFails(org.apache.oozie.service.TestStatusTransitService)
. testCoordMaterializeTriggerService3(org.apache.oozie.service.TestCoordMaterializeTriggerService)
. testSamplers(org.apache.oozie.util.TestMetricsInstrumentation)
+1 DISTRO
. +1 distro tarball builds with the patch
----------------------------
-1 Overall result, please check the reported -1(s)
The full output of the test-patch run is available at
. https://builds.apache.org/job/oozie-trunk-precommit-build/2773/
Testing JIRA OOZIE-2429
Cleaning local git workspace
----------------------------
+1 PATCH_APPLIES
+1 CLEAN
+1 RAW_PATCH_ANALYSIS
. +1 the patch does not introduce any @author tags
. +1 the patch does not introduce any tabs
. +1 the patch does not introduce any trailing spaces
. +1 the patch does not introduce any line longer than 132
. +1 the patch does adds/modifies 1 testcase(s)
+1 RAT
. +1 the patch does not seem to introduce new RAT warnings
+1 JAVADOC
. +1 the patch does not seem to introduce new Javadoc warnings
+1 COMPILE
. +1 HEAD compiles
. +1 patch compiles
. +1 the patch does not seem to introduce new javac warnings
+1 BACKWARDS_COMPATIBILITY
. +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations
. +1 the patch does not modify JPA files
-1 TESTS
. Tests run: 1764
. Tests failed: 1
. Tests errors: 0
. The patch failed the following testcases:
. testSamplers(org.apache.oozie.util.TestMetricsInstrumentation)
+1 DISTRO
. +1 distro tarball builds with the patch
----------------------------
-1 Overall result, please check the reported -1(s)
The full output of the test-patch run is available at
. https://builds.apache.org/job/oozie-trunk-precommit-build/2775/
Rebuilt it, just the good old testSamplers remained, there is a jira for that one as well: OOZIE-2466.
Please review the changes.
Thanks Ferenc for fixing this and Puru for help reviewing. Committed to trunk!
The patch is order agnostic now. The event queue is not deterministic.
Now it constantly fails, however in the previous test it passed, so now only testing if all the events are in the queue once and only once.
Two minor things on the amendment patch:
- With
+ HashMap<AppType,JobEvent> eventsMap = new HashMap<>(); + for (int i=0; i<3; ++i){ + JobEvent event = (JobEvent) queue.poll(); + eventsMap.put(event.getAppType(), event); + } + + assertEquals(3, eventsMap.size());
The for loop shouldn't assume 3 items. If there are less, then queue.poll() will throw an Exception. If there are more, then the assertEquals statement won't catch it. It should loop through all items in queue. The assertEquals will ensure that we only have the expected 3 items.
- I don't think we need the curly brace enclosures
Robert Kanter Thanks for the review.
1. I have checked the size of the queue in the previous line, just above the code quoted here. I made the change now.
The assertEquals is there to catch if there are duplicate elements, in which case the size will be less than 3.
2. The braces are there to ensure that the variables are not mixed up. It is not only about formatting. Like it was previously, when one asserted on an unrelated variable by mistake in this very test. In this way the compiler checks this case for you. I have kept it as is.
Attached the new patch.
Ok, the braces are fine.
I kicked off Jenkins, the find-patches job seems to be stuck
Testing JIRA OOZIE-2429
Cleaning local git workspace
----------------------------
+1 PATCH_APPLIES
+1 CLEAN
+1 RAW_PATCH_ANALYSIS
. +1 the patch does not introduce any @author tags
. +1 the patch does not introduce any tabs
. +1 the patch does not introduce any trailing spaces
. +1 the patch does not introduce any line longer than 132
. +1 the patch does adds/modifies 1 testcase(s)
+1 RAT
. +1 the patch does not seem to introduce new RAT warnings
+1 JAVADOC
. +1 the patch does not seem to introduce new Javadoc warnings
-1 COMPILE
. +1 HEAD compiles
. -1 patch does not compile
. +1 the patch does not seem to introduce new javac warnings
+1 BACKWARDS_COMPATIBILITY
. +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations
. +1 the patch does not modify JPA files
-1 TESTS - patch does not compile, cannot run testcases
-1 DISTRO
. -1 distro tarball fails with the patch
----------------------------
-1 Overall result, please check the reported -1(s)
The full output of the test-patch run is available at
. https://builds.apache.org/job/oozie-trunk-precommit-build/2781/
We're still using Java 6, so you can't use the diamond operator:
new HashMap<>();
You have to do the old way:
new HashMap<AppType,JobEvent>()
I've modified your addendum patch with that change and kicked off Jenkins again. (addendum-3.patch)
Testing JIRA OOZIE-2429
Cleaning local git workspace
----------------------------
+1 PATCH_APPLIES
+1 CLEAN
+1 RAW_PATCH_ANALYSIS
. +1 the patch does not introduce any @author tags
. +1 the patch does not introduce any tabs
. +1 the patch does not introduce any trailing spaces
. +1 the patch does not introduce any line longer than 132
. +1 the patch does adds/modifies 1 testcase(s)
+1 RAT
. +1 the patch does not seem to introduce new RAT warnings
+1 JAVADOC
. +1 the patch does not seem to introduce new Javadoc warnings
+1 COMPILE
. +1 HEAD compiles
. +1 patch compiles
. +1 the patch does not seem to introduce new javac warnings
+1 BACKWARDS_COMPATIBILITY
. +1 the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations
. +1 the patch does not modify JPA files
-1 TESTS
. Tests run: 1768
. Tests failed: 1
. Tests errors: 0
. The patch failed the following testcases:
. testCoord_throwException(org.apache.oozie.command.coord.TestCoordChangeXCommand)
+1 DISTRO
. +1 distro tarball builds with the patch
----------------------------
-1 Overall result, please check the reported -1(s)
The full output of the test-patch run is available at
. https://builds.apache.org/job/oozie-trunk-precommit-build/2782/
Looks like that not only the the test is wrong (where the EventHandlerService manipulates the event queue in the background), but there are really duplicate COORDINATOR_ACTION - FAILURE messages. We have to modify the event generation by checking if the coordinator action's state has really changed at the update stage.
The test passes in some cases, where the background processes are slow to add the last event. W have to make sure that all the related actions are complete by the time we check the results.
Also the testForNoDuplicates checks 3 things. These should be separated out to 3 tests (interestingly one of the tests inside waits for an other one to happen. This shows one of the problems with this practice).