|
Report for test failure
This one fails for me consistently on this box now (though I have some local changes). Will test with clean checkout and one some other environments tomorrow.
Fails in Eclipse too so should be able to debug this. Bit tired now so this'll have to wait.
There's an high chance this is a problem in the test suite, but I'm not sure.
If debugging it takes too much maybe it's better to remove the tests at all. BTW what is the exact test method failing? From the ant output I can only see RemoteDeliveryTest but not the method. Testcase: test1 took 1.307 sec
Testcase: test2_0 took 2.019 sec Testcase: test2 took 2.049 sec Testcase: test3 took 0.037 sec Testcase: test4 took 2.087 sec Testcase: test5 took 0.071 sec FAILED expected:<5> but was:<2> junit.framework.AssertionFailedError: expected:<5> but was:<2> at org.apache.james.transport.remotedeliverytester.AbstractRemoteDeliveryTest.assertWhole(AbstractRemoteDeliveryTest.java:180) at org.apache.james.transport.remotedeliverytester.AbstractRemoteDeliveryTest.doTest5(AbstractRemoteDeliveryTest.java:388) at org.apache.james.transport.remotedeliverytester.AbstractRemoteDeliveryTest.test5(AbstractRemoteDeliveryTest.java:70) For more details, see the attached report Failure seems to be intermittent. Passed fine for me today.
I'm beginning to think that this has something to do with time. It's now 10:49 BST (about the same time as yesterday) and it's now consistently failing for me.
Correction - didn't fail on my single core box running Java6.
[junit] Running org.apache.james.transport.mailets.RemoteDeliveryTest
[junit] Tests run: 6, Failures: 1, Errors: 0, Time elapsed: 6.854 sec [junit] Mail mail-to-test.it stored. [junit] Unlocked mail-to-test.it for Remote delivery thread (0) @ Thu Jul 31 20:35:51 BST 2008 [junit] Method accept(Filter) called [junit] Found item mail-to-test.it in spool. [junit] Locked mail-to-test.it for Remote delivery thread (0) @ Thu Jul 31 20:35:51 BST 2008 [junit] accept(Filter) has locked: mail-to-test.it [junit] Retrieving mail: mail-to-test.it [junit] Unlocked mail-to-test.it for Remote delivery thread (0) @ Thu Jul 31 20:35:51 BST 2008 Looking at the attachment and the code I would say the issue is that sometimes the time we wait for the spool to be processed is not enough:
----- // Wait for empty spool assertEquals(0, waitEmptySpool(10000)); ----- The problem is that 10 seconds to process 2 mails should be really enough even on slow machines, so there must be something in locking/notification for the RemoteDelivery threads. I added a Thread.dumpStack on AssertionFailedError to test5 Basically what test5 expect is the sender to try a first send, one addres has a permanent failure and one a temporary failure, so "half mail" is stored back in outgoing and retried again for delivery multiple times. This only happens once and then it stops: it's unlikely a deadlock, more probably a notification issue, but let's see if the dumpStack give us hints! I have to check the InMemorySpoolRepository to understand if notification/locking is managed the same way of the official file/db repositories, too. I identified a possible synchronization issue in the Tester. Applied a fix, let me know if you can still see the problem or not!
Still get:
[junit] Tests run: 6, Failures: 1, Errors: 0, Time elapsed: 6.597 sec [junit] Mail mail-to-test.it stored. [junit] Unlocked mail-to-test.it for Remote delivery thread (0) @ Sat Aug 02 08:57:51 BST 2008 [junit] Method accept(Filter) called [junit] Found item mail-to-test.it in spool. [junit] Locked mail-to-test.it for Remote delivery thread (0) @ Sat Aug 02 08:57:51 BST 2008 [junit] accept(Filter) has locked: mail-to-test.it [junit] Retrieving mail: mail-to-test.it [junit] Unlocked mail-to-test.it for Remote delivery thread (0) @ Sat Aug 02 08:57:51 BST 2008 Robert: was the failure still in test5 ? I added untested debug code to create a stack dump (via reflection using jvm 5 methods) on a failure: is there any output for this?
I think that this issue is only exhibited by faster, multicore machines. By adding a sleep after assertEquals(0, waitEmptySpool(10000)); the number of messages receives varies in proportion to the duration of the sleep. In addition, I wonder whether Tester needs more synchronization.
This is interesting!
First I noticed there are too many delivery threads: I forgot to destroy the RemoteDelivery after each test and they was not stopped/disallocated. I committed a fix for this, but I don't think this was the issue. Rather I see a weird stack for a delivery thread: Thread Thread[Remote delivery thread (0),5,main] STE: sun.reflect.ClassFileAssembler.emitShort(ClassFileAssembler.java:45) STE: sun.reflect.ClassFileAssembler.emitConstantPoolClass(ClassFileAssembler.java:96) STE: sun.reflect.AccessorGenerator.emitBoxingContantPoolEntries(AccessorGenerator.java:310) STE: sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:337) STE: sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:95) STE: sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:313) STE: java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1299) STE: java.io.ObjectStreamClass.access$1500(ObjectStreamClass.java:52) STE: java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:420) STE: java.security.AccessController.doPrivileged(Native Method) STE: java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:400) STE: java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:297) STE: java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1035) STE: java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:302) STE: java.util.HashMap.writeObject(HashMap.java:1038) STE: sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) STE: sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) STE: java.lang.reflect.Method.invoke(Method.java:585) STE: java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:917) STE: java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1339) STE: java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1290) STE: java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1079) STE: java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:302) STE: org.apache.james.core.MailImpl.cloneSerializableObject(MailImpl.java:614) STE: org.apache.james.core.MailImpl.<init>(MailImpl.java:162) STE: org.apache.james.test.mock.james.InMemorySpoolRepository.store(InMemorySpoolRepository.java:150) STE: org.apache.james.transport.remotedeliverytester.TesterMailetConfig$SpoolRepositoryWrapper.store(TesterMailetConfig.java:181) STE: org.apache.james.transport.mailets.RemoteDelivery.run(RemoteDelivery.java:1264) STE: java.lang.Thread.run(Thread.java:595) This could be the one we are investigating, because the RemoteDelivery.java:1264 is the place where a message has been delayed (temporary failure) and it is stored to the spool to be further processed, but it seems to be stuck during a clone operation. The weird thing is that under a similar circumstance we would have at least 1 mail in the spool and the assertEquals(0, waitEmptySpool(xxxx)) should fail, instead it fails later on the assertWhole. I'm confused, ATM... Robert, about your comment on "assertEquals(0, waitEmptySpool(10000));"
The weird thing is that the mechanism of waitEmptySpool(10000) is meant to wait that time (that should really be enough because the delivery thread is configured for 500ms delayed retries) and after the time returns the number of mails present in the spool. The fact that the assert is not failing there let me think that the spool is empty and the delivery completed his job, that's why I thought it could have been an issue in the Tester count method BUT I'm comparing your delivery output with mine and it is identical excepting that your report ends after the first "store", while mine continue with unlocking the message and doing the remaining attempts. So the problem is not in the Tester counter, but it really seems to be that it got stuck after the first partial delivery. I don't understand why it does not fail on the first assert, but the issue could be the remote delivery thread is stuck on the MailImpl cloneSerializable method, and this would be an issue in server code, so we have to keep investigating. PS: I'm running the code in a dual core machine, too, but I don't see this issue. I tested java 1.4, 5 and 6 on a 32bit windows platform and java 1.4 and 5 on linux, and I can never reproduce this! I run 64bit linux with a 64bit JVM.
Updated and ran again
Note this is a different test failure. Will run again and see I can reproduce a Test 5 failure.
I noticed an issue in testMulti in Hudson too, it was on @d.it domains, so maybe this time is an issue in the test. I'll monitor more failure to get a better idea while we investigate on the test5 issue.
Interestingly I run it now on my machine and I reproduced a test5 failure, the logging was: ------------- Thread Thread[Remote delivery thread (0),5,main] - java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390) - java.lang.StringBuilder.append(StringBuilder.java:119) - java.lang.Class.getDeclaredMethod(Class.java:1937) - java.io.ObjectStreamClass.getInheritableMethod(ObjectStreamClass.java:1349) - java.io.ObjectStreamClass.access$2200(ObjectStreamClass.java:52) - java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:450) - java.security.AccessController.doPrivileged(Native Method) - java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:413) - java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:310) - java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1106) - java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1509) - java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1474) - java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1392) - java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1150) - java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:326) - java.util.HashMap.writeObject(HashMap.java:1001) - sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) - sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) - java.lang.reflect.Method.invoke(Method.java:597) - java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:945) - java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1461) - java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1392) - java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1150) - java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:326) - org.apache.james.core.MailImpl.cloneSerializableObject(MailImpl.java:614) - org.apache.james.core.MailImpl.<init>(MailImpl.java:162) - org.apache.james.test.mock.james.InMemorySpoolRepository.store(InMemorySpoolRepository.java:150) - org.apache.james.transport.remotedeliverytester.TesterMailetConfig$SpoolRepositoryWrapper.store(TesterMailetConfig.java:181) - org.apache.james.transport.mailets.RemoteDelivery.run(RemoteDelivery.java:1264) - java.lang.Thread.run(Thread.java:619) -------------- So, again, the deliverer thread is stuck in that cloneSerializableObject (the stack is not the same, but similar, maybe the difference are in JVM version/platform). In real world the spoolrepository implementations we use have a real write/read to disk/jdbc and they probably don't use the "copy" constructor we use in InMemorySpoolRepository, but this constructor is used in many places in James code, so if we have an issue there we should better find it even if this won't hit real world RemoteDelivery but it is more probable it will happen in the LinearProcessor (spooler), and in the Redirect/Forward mailets tree. Agreed
I wonder whether we might be able to simply the implementation by using poor-man's-clone only for attributes rather than the complete hash map. Maybe I'll try find to give it a while locally. One major change I've wanted to propose for a while is a move to a data+meta-data message paradigm. The idea is that the incoming message would not be constantly transformed during passage, rather the message would be stored and any processing done only at the meta-data level. Only on exgress would changes be consolidated. This would help to reduce copying of the message content as well as offers potential simplifications of the architecture. Attached is the log for the patch below. This is quite interesting: the problem seems to be calculating the SUID for the exception stored against delivery-error.
Index: core-library/src/main/java/org/apache/james/core/MailImpl.java =================================================================== --- core-library/src/main/java/org/apache/james/core/MailImpl.java (revision 682119) +++ core-library/src/main/java/org/apache/james/core/MailImpl.java (working copy) @@ -158,16 +158,17 @@ setRemoteAddr(mail.getRemoteAddr()); setLastUpdated(mail.getLastUpdated()); try { - if (mail instanceof MailImpl) { - setAttributesRaw((HashMap) cloneSerializableObject(((MailImpl) mail).getAttributesRaw())); - } else { - HashMap attribs = new HashMap(); - for (Iterator i = mail.getAttributeNames(); i.hasNext(); ) { - String hashKey = (String) i.next(); - attribs.put(hashKey,cloneSerializableObject(mail.getAttribute(hashKey))); - } - setAttributesRaw(attribs); + // NOTE: switching to less performance implementation in an attempt to discover + // NOTE: why optimised method is intermittently sticking + HashMap attribs = new HashMap(); + for (Iterator i = mail.getAttributeNames(); i.hasNext(); ) { + String hashKey = (String) i.next(); + System.err.println("ATT[START]@" + System.currentTimeMillis() + " " + hashKey); + final Serializable attribute = mail.getAttribute(hashKey); + attribs.put(hashKey,cloneSerializableObject(attribute)); + System.err.println("ATT[END]@" + System.currentTimeMillis() + " " + hashKey); } + setAttributesRaw(attribs); } catch (IOException e) { // should never happen for in memory streams setAttributesRaw(new HashMap()); @@ -609,6 +610,7 @@ * @throws ClassNotFoundException */ private static Object cloneSerializableObject(Object o) throws IOException, ClassNotFoundException { + System.err.println("CSO[START]@" + System.currentTimeMillis() + " " + o); ByteArrayOutputStream b = new ByteArrayOutputStream(); ObjectOutputStream out = new ObjectOutputStream(b); out.writeObject(o); @@ -617,6 +619,7 @@ ByteArrayInputStream bi=new ByteArrayInputStream(b.toByteArray()); ObjectInputStream in = new ObjectInputStream(bi); Object no = in.readObject(); + System.err.println("CSO[END]@" + System.currentTimeMillis() + " " + o); return no; } Very interesting. I can't understand why a similar issue (calculateSUID) is intermitant.
In this case there is another test-related variable: the exception throwed by the tester are specific to the tester and not the one from javamail, so maybe there are serialization issues there. I just added the serialversionUID to the 3 involved exceptions. We see the issue is in the attributes copy but we are not sure that the issue is in a specific method: one thing that came to mind is that some racing issue could make the tester to fail creating an appropriate SendFailedException tree (the nesting is important there) and creates some infinite loop that make the serializer stuck dealing with it... Maybe we can add a debug to dump the full exception tree (by manually looping through childs)... Using the following patch, it appears that there's a certain amount of circularity
Index: core-library/src/main/java/org/apache/james/core/MailImpl.java =================================================================== --- core-library/src/main/java/org/apache/james/core/MailImpl.java (revision 682170) +++ core-library/src/main/java/org/apache/james/core/MailImpl.java (working copy) @@ -27,7 +27,10 @@ import org.apache.mailet.MailAddress; import org.apache.mailet.RFC2822Headers; +import com.sun.mail.smtp.SMTPAddressFailedException; + import javax.mail.MessagingException; +import javax.mail.SendFailedException; import javax.mail.internet.InternetAddress; import javax.mail.internet.MimeMessage; import javax.mail.internet.ParseException; @@ -158,16 +161,19 @@ setRemoteAddr(mail.getRemoteAddr()); setLastUpdated(mail.getLastUpdated()); try { - if (mail instanceof MailImpl) { - setAttributesRaw((HashMap) cloneSerializableObject(((MailImpl) mail).getAttributesRaw())); - } else { - HashMap attribs = new HashMap(); - for (Iterator i = mail.getAttributeNames(); i.hasNext(); ) { - String hashKey = (String) i.next(); - attribs.put(hashKey,cloneSerializableObject(mail.getAttribute(hashKey))); + HashMap attribs = new HashMap(); + for (Iterator i = mail.getAttributeNames(); i.hasNext(); ) { + String hashKey = (String) i.next(); + final Serializable attribute = mail.getAttribute(hashKey); + if (attribute instanceof SendFailedException) { + SendFailedException ex = (SendFailedException) attribute; + Throwable t = ex.getNextException(); + debug(t); } - setAttributesRaw(attribs); + final Object cloneSerializableObject = cloneSerializableObject(attribute); + attribs.put(hashKey,cloneSerializableObject); } + setAttributesRaw(attribs); } catch (IOException e) { // should never happen for in memory streams setAttributesRaw(new HashMap()); @@ -176,6 +182,28 @@ setAttributesRaw(new HashMap()); } } + private void debug(Throwable t) { + if (t == null) { + System.err.println("[END]"); + } else { + System.err.println("NEXT@" + System.identityHashCode(t) + t ); + if (t instanceof MessagingException) { + MessagingException e = (MessagingException) t; + System.err.println("NEXT@" + System.identityHashCode(e) + e ); + t = e.getNextException(); + if (t == null) { + t = e.getCause(); + } + debug(t); + } else { + System.err.println("NEXT@" + System.identityHashCode(t) + t ); + Throwable next = t.getCause(); + if (next != t) { + debug(next); + } + } + } + } /** * A constructor that creates a MailImpl with the specified name, I don't see circularity.
We have an SMTPAddressFailedException 500I and a nested SMTPAddressFailedException 400V. It is logged twice because the patch seems to log twice. From the log I would say that the exception has a correct structure. This whole thing is very weird... I don't even know how to search for a similar bug in the sun java database.. maybe we should give Harmony a try for this... never tried Harmony, maybe this is the right opportunity... The main problem is that it is intermitant... I think it's a race condition involving concurrent initial serialization. Which would be a JVM bug I have no idea how to search for it.
I've been using a patch this afternoon which serializes the exception earlier in RemoteDeliveryTest and haven't had any failures yet. So this might be a suitable workaround for the test. It's hard to know whether an intermittent fault is fixed but I've been running today and it hasn't failed so far. It's a reasonbly clean solution. Would need more comments before committing but please give it a try.
Index: mailets-function/src/test/java/org/apache/james/transport/remotedeliverytester/SMTPAddressFailedException.java =================================================================== --- mailets-function/src/test/java/org/apache/james/transport/remotedeliverytester/SMTPAddressFailedException.java (revision 682219) +++ mailets-function/src/test/java/org/apache/james/transport/remotedeliverytester/SMTPAddressFailedException.java (working copy) @@ -47,7 +47,7 @@ * @param err * An error message for the exception. */ - SMTPAddressFailedException(InternetAddress addr, java.lang.String cmd, int rc, java.lang.String err) { + public SMTPAddressFailedException(InternetAddress addr, java.lang.String cmd, int rc, java.lang.String err) { super(err); this.cmd = cmd; this.rc = rc; Index: mailets-function/src/test/java/org/apache/james/transport/mailets/RemoteDeliveryTest.java =================================================================== --- mailets-function/src/test/java/org/apache/james/transport/mailets/RemoteDeliveryTest.java (revision 682219) +++ mailets-function/src/test/java/org/apache/james/transport/mailets/RemoteDeliveryTest.java (working copy) @@ -19,16 +19,47 @@ package org.apache.james.transport.mailets; +import org.apache.james.core.MailImpl; import org.apache.james.transport.remotedeliverytester.AbstractRemoteDeliveryTest; import org.apache.james.transport.remotedeliverytester.RemoteDeliveryTestable; +import org.apache.james.transport.remotedeliverytester.SMTPAddressFailedException; +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.ObjectInputStream; +import java.io.ObjectOutputStream; import java.util.Properties; +import javax.mail.internet.InternetAddress; + /** * Tester for the standard RemoteDelivery Mailet */ public class RemoteDeliveryTest extends AbstractRemoteDeliveryTest { + private static Object cloneSerializableObject(Object o) throws IOException, ClassNotFoundException { + ByteArrayOutputStream b = new ByteArrayOutputStream(); + ObjectOutputStream out = new ObjectOutputStream(b); + out.writeObject(o); + out.flush(); + out.close(); + ByteArrayInputStream bi=new ByteArrayInputStream(b.toByteArray()); + ObjectInputStream in = new ObjectInputStream(bi); + Object no = in.readObject(); + return no; + } + + static { + try { + cloneSerializableObject(new SMTPAddressFailedException(new InternetAddress(), "Whatever", 1, "Another")); + } catch (Throwable t) { + System.err.println("CANNOT SERIALIZE CLASS"); + t.printStackTrace(); + } + } + + public RemoteDeliveryTestable getDeliverer() { StandardRemoteDeliveryTestable d = new StandardRemoteDeliveryTestable(); d.logEnabled = true; Darn just had a failure: looks like it doesn't work after all
Been playing about with externalisation. Seems to pass ok provided that InternetAddress is not serialized.
I've wasted a lot of time on this now and don't think I'm any nearer to actually solving anything. I just can't develop JAMES when the tests take 13 minutes to run and fail 80% of the time. Maybe it's time to give up on this test...
Still failing with r687810 (which should have Geronimo jar in, I believe). A different modus operandi this time, though.
This is still unresolved. Maybe it's time to consider disabling this test on AMD64.
Index: src/test/java/org/apache/james/transport/remotedeliverytester/AbstractRemoteDeliveryTest.java =================================================================== --- src/test/java/org/apache/james/transport/remotedeliverytester/AbstractRemoteDeliveryTest.java (revision 724099) +++ src/test/java/org/apache/james/transport/remotedeliverytester/AbstractRemoteDeliveryTest.java (working copy) @@ -51,6 +51,15 @@ public abstract RemoteDeliveryTestable getDeliverer(); public abstract Properties getParameters(); + public AbstractRemoteDeliveryTest() { + // This test fails frequently on AMD64 Linux. + // The cause is hard to track down (help welcomed!) + // Seems to be platform rather than code + if ("Linux".equalsIgnoreCase(System.getProperty("os.name")) && "amd64".equalsIgnoreCase(System.getProperty("os.name"))) { + doTest = Integer.MAX_VALUE; + } + } + public void test1() throws Exception { if (doTest == 0 || doTest == 1) doTest1(getDeliverer(), getParameters()); } Think I've fixed this now. I'll reopen again if not.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
[junit] Running org.apache.james.transport.mailets.RemoteDeliveryTest
[junit] Tests run: 6, Failures: 1, Errors: 0, Time elapsed: 7.581 sec
[junit] Mail mail-to-test.it stored.
[junit] Unlocked mail-to-test.it for Remote delivery thread (0) @ Mon Jul 28 22:23:53 BST 2008
[junit] Method accept(Filter) called
[junit] Found item mail-to-test.it in spool.
[junit] Locked mail-to-test.it for Remote delivery thread (0) @ Mon Jul 28 22:23:53 BST 2008
[junit] accept(Filter) has locked: mail-to-test.it
[junit] Retrieving mail: mail-to-test.it
[junit] Unlocked mail-to-test.it for Remote delivery thread (0) @ Mon Jul 28 22:23:53 BST 2008
this box has two cores so i wonder whether it might a concurrency issue of some kind