Issue created based on mailing list discussion here:
Citing the mail content:
First a short background info, what our company would like to achieve using James. We have several clients that need mass-mailing capability, thus we want to create a system that allows us to track these mails and especially store errors from bounces (also by using VERP), so we can block unreachable addresses. I created a few custom mailets to handle those requirements and a custom RemoteDelivery implementation for creating VERP style sender addresses. The storage is all done in a MySQL database via JPA. Currently we use it for one client which is not sending huge amounts of mails, around 80,000 per month.
The system works well, but it happens after a bigger batch of mails (around 4,000 mails) that the mail queueing system gets somehow "stuck". The SMTP server is able to retrieve new mails, but somehow they don't get send out, until I restart the server. Sometimes a flush() on queue/spool helps, but most of the times a restart is necessary.
I added logging to my mailets when they start and finish to see if there is a problem, but they seem to run fine (no exceptions). It seems that the mails that arrive after James gets stuck get processed normally by the mailets, but somehow are not dequeued and sent out.
James 3.0 Beta3
linux 2.6.24-23-xen #1 SMP Mon Jan 26 03:09:12 UTC 2009 x86_64 GNU/Linux
Frankly I'm not sure what is the best way to debug this kind of problem with James. I could find a few other threads with people describing similar problems, but I couldn't get closer to the problem.
Here some data from JMX that I took while the server was in "stuck" status.
Size == 80
Size == 0
CurrentSpoolCount == 0
ThreadCount == 20
TotalTasks == 0
AverageTaskTime == 0.0
ActiveThreads == 2
ActiveTasks == 2
QueuedTasks == 0
MaximalThreads == 2
TotalTasks == 4653
AverageTaskTime == 145.0
ActiveThreads == 20
ActiveTasks == 0
QueuedTasks == 0
MaximalThreads == 20
I also created a jstack output, as it was suggested in one of the threads I found. I can post it here as well if it could be helpful (I didn't do it now, because the file is quite large).
This is the logging output when a new mail is sent while the queue is stuck (IgnoreBlockedMailet and SendAttemptLogMailet are my custom mailets):
DEBUG 03:54:46,819 | james.mailspooler | ==== Begin processing mail Mail1322625286815-aa8034cb-4d50-4044-926f-10bbe8b637e5====
DEBUG 03:54:46,819 | james.mailprocessor | Call MailProcessor root
INFO 03:54:46,827 | james.mailetcontext | IgnoreBlockedMailet called - recipients=[firstname.lastname@example.org]
INFO 03:54:46,847 | james.mailetcontext | IgnoreBlockedMailet finished - recipients=[email@example.com]
DEBUG 03:54:46,847 | james.mailprocessor | Call MailProcessor transport
INFO 03:54:46,863 | james.mailetcontext | SendAttemptLogMailet called - recipients=[firstname.lastname@example.org]
INFO 03:54:46,927 | james.mailetcontext | SendAttemptLogMailet finished - recipients=[email@example.com]
DEBUG 03:54:46,931 | james.mailprocessor | End of mailetprocessor for state root reached
What is missing is this part, which usually follows when the server is functioning (at least follows after a few minutes if the server is busy).
INFO 03:54:46,943 | james.mailetcontext | Attempting delivery of Mail1322620955282-2b903c2d-8378-4a10-848b-6ebd489182bf-to-eggsist.com to host ASPMX.L.GOOGLE.com. at 22.214.171.124 from bounce+t.eichmann=eggsist.com@???.com for addresses [firstname.lastname@example.org]
INFO 03:54:47,671 | james.mailetcontext | Mail (Mail1322620955282-2b903c2d-8378-4a10-848b-6ebd489182bf-to-eggsist.com) sent successfully to ASPMX.L.GOOGLE.com. at 126.96.36.199 from bounce+t.eichmann=eggsist.com@???.com for [email@example.com]
There are a couple of mails that get re-queued 25 times (default limit for attempts) due to DNS issues (like no MX entry for a domain etc.) until they are bounced in the system. But I assume James should be able to handle such mails and new ones in the queue without problems.
Is there a way to examine mails that are queued and their status? Like how many times are they already being re-queued, why are they still in the queue etc.? I'm really not sure how I can debug this issue as I'm new to James. Any help is very appreciated.