Bug 38806 - Disabled workers in mod_jk are not retried once they get in error state
Summary: Disabled workers in mod_jk are not retried once they get in error state
Status: RESOLVED FIXED
Alias: None
Product: Tomcat Connectors
Classification: Unclassified
Component: Common (show other bugs)
Version: unspecified
Hardware: Other All
: P2 normal (vote)
Target Milestone: ---
Assignee: Tomcat Developers Mailing List
URL:
Keywords: PatchAvailable
Depends on:
Blocks:
 
Reported: 2006-02-28 13:02 UTC by Ruediger Pluem
Modified: 2008-10-05 03:09 UTC (History)
0 users



Attachments
Patch against 1.2.15 (1.07 KB, patch)
2006-02-28 13:03 UTC, Ruediger Pluem
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ruediger Pluem 2006-02-28 13:02:20 UTC
I use mod_jk 1.2.15 in a failover configuration with session stickyness:

# List of available workers
worker.list=failover
 
# Master worker
# Take care that the jvmRoute attribute in the Engine tag is set to master
# for the Tomcat addressed by MASTER_HOST and MASTER_PORT
worker.master.port=MASTER_PORT
worker.master.host=MASTER_HOST
worker.master.type=ajp13
worker.master.cachesize=10
worker.master.cache_timeout=600
worker.master.socket_keepalive=1
worker.master.prepost_timeout=300
worker.master.reply_timeout=120000
worker.master.recovery_options=3
# redirect to backup if master fails
worker.master.redirect=backup

# Backup worker for failover
# Take care that the jvmRoute attribute in the Engine tag is set to backup
# for the Tomcat addressed by BACKUP_HOST and BACKUP_PORT
worker.backup.port=BACKUP_PORT
worker.backup.host=BACKUP_HOST
worker.backup.type=ajp13
worker.backup.cachesize=10
worker.backup.cache_timeout=600
worker.backup.socket_keepalive=1
worker.backup.prepost_timeout=300
worker.backup.reply_timeout=120000
worker.backup.recovery_options=3
# Set worker to disabled. This means it gets only requests in the case that
# - The session route points to this worker
# - In the failover case (see redirect setting for master above)
worker.backup.disabled=1

# Failover worker
worker.failover.type=lb
worker.failover.balanced_workers=master, backup

Once I got a session from the backup worker the session stays on this disabled
worker which is correct and expected. But if the backup server goes into error
state it does not recover from this state as disabled workers are not retried.
This is bad in the case that the disabled worker had been choosen because of
session stickyness. The attached patch fixes this.
Comment 1 Ruediger Pluem 2006-02-28 13:03:09 UTC
Created attachment 17807 [details]
Patch against 1.2.15
Comment 2 Mladen Turk 2006-02-28 13:18:22 UTC
Right, it makes sense to retry the disabled worker also.
Try by changing the 
#define JK_WORKER_IN_ERROR(w) ((w)->in_error_state  && !(w)->is_disabled &&
!(w)->is_busy)
to:
#define JK_WORKER_IN_ERROR(w) ((w)->in_error_state && !(w)->is_busy)

Your patch only addresses the byreq lb methods, while the others should be
treated in the same way.
Comment 3 Ruediger Pluem 2006-02-28 17:05:22 UTC
Ok. I just wasn't sure if adjusting JK_WORKER_IN_ERROR was the right thing to
do, so I limited the change to find_bysession_route. Do we really care about
disabled workers in find_best_byrequests, find_best_bytraffic and
get_most_suitable_worker (here only the one worker case)? I don't think so.
Comment 4 Mladen Turk 2006-02-28 17:57:34 UTC
Right, we don't care about disabled workers for a single worker
cause it's an oxymoron.

Anyhow, adjusting JK_WORKER_IN_ERROR should do the trick.
I really can not remember why I put that check at the first place.
Comment 5 Mladen Turk 2006-02-28 18:21:08 UTC
Fixed in the SVN.
Thanks for spotting that.