Bug 38806

Summary:	Disabled workers in mod_jk are not retried once they get in error state
Product:	Tomcat Connectors	Reporter:	Ruediger Pluem <rpluem>
Component:	Common	Assignee:	Tomcat Developers Mailing List <dev>
Status:	RESOLVED FIXED
Severity:	normal	Keywords:	PatchAvailable
Priority:	P2
Version:	unspecified
Target Milestone:	---
Hardware:	Other
OS:	All
Attachments:	Patch against 1.2.15

Description Ruediger Pluem 2006-02-28 13:02:20 UTC

I use mod_jk 1.2.15 in a failover configuration with session stickyness:

# List of available workers
worker.list=failover
 
# Master worker
# Take care that the jvmRoute attribute in the Engine tag is set to master
# for the Tomcat addressed by MASTER_HOST and MASTER_PORT
worker.master.port=MASTER_PORT
worker.master.host=MASTER_HOST
worker.master.type=ajp13
worker.master.cachesize=10
worker.master.cache_timeout=600
worker.master.socket_keepalive=1
worker.master.prepost_timeout=300
worker.master.reply_timeout=120000
worker.master.recovery_options=3
# redirect to backup if master fails
worker.master.redirect=backup

# Backup worker for failover
# Take care that the jvmRoute attribute in the Engine tag is set to backup
# for the Tomcat addressed by BACKUP_HOST and BACKUP_PORT
worker.backup.port=BACKUP_PORT
worker.backup.host=BACKUP_HOST
worker.backup.type=ajp13
worker.backup.cachesize=10
worker.backup.cache_timeout=600
worker.backup.socket_keepalive=1
worker.backup.prepost_timeout=300
worker.backup.reply_timeout=120000
worker.backup.recovery_options=3
# Set worker to disabled. This means it gets only requests in the case that
# - The session route points to this worker
# - In the failover case (see redirect setting for master above)
worker.backup.disabled=1

# Failover worker
worker.failover.type=lb
worker.failover.balanced_workers=master, backup

Once I got a session from the backup worker the session stays on this disabled
worker which is correct and expected. But if the backup server goes into error
state it does not recover from this state as disabled workers are not retried.
This is bad in the case that the disabled worker had been choosen because of
session stickyness. The attached patch fixes this.

Comment 1 Ruediger Pluem 2006-02-28 13:03:09 UTC

Created attachment 17807 [details]
Patch against 1.2.15

Comment 2 Mladen Turk 2006-02-28 13:18:22 UTC

Right, it makes sense to retry the disabled worker also.
Try by changing the 
#define JK_WORKER_IN_ERROR(w) ((w)->in_error_state  && !(w)->is_disabled &&
!(w)->is_busy)
to:
#define JK_WORKER_IN_ERROR(w) ((w)->in_error_state && !(w)->is_busy)

Your patch only addresses the byreq lb methods, while the others should be
treated in the same way.

Comment 3 Ruediger Pluem 2006-02-28 17:05:22 UTC

Ok. I just wasn't sure if adjusting JK_WORKER_IN_ERROR was the right thing to
do, so I limited the change to find_bysession_route. Do we really care about
disabled workers in find_best_byrequests, find_best_bytraffic and
get_most_suitable_worker (here only the one worker case)? I don't think so.

Comment 4 Mladen Turk 2006-02-28 17:57:34 UTC

Right, we don't care about disabled workers for a single worker
cause it's an oxymoron.

Anyhow, adjusting JK_WORKER_IN_ERROR should do the trick.
I really can not remember why I put that check at the first place.

Comment 5 Mladen Turk 2006-02-28 18:21:08 UTC

Fixed in the SVN.
Thanks for spotting that.