mod_jk 1.2.25 comes with JK_REPLY_TIMEOUT as new error status. on 1.2.25, when reply_timeout is happend on getting replies from tomcat, JK_REPLY_TIMEOUT is returned with http error code JK_HTTP_GATEWAY_TIME_OUT(504) at ajp_service@jk_ajp_common.c(line 2038,2107) Currently, mod_jk can handle a fallback operation (send a request to another worker) by lb_worker only when http error code is JK_HTTP_SERVER_BUSY. (see service@jk_lb_worker.c(line 1101)) so, handling JK_HTTP_GATEWAY_TIMEOUT with JK_REPLY_TIMEOUT status, the value of rc is always JK_FALSE, and lb_worker doesn't try next one. 1.2.23 can fall back to the next one because of JK_HTTP_SERVER_BUSY is always returned when reply_timeout occurs. my proposal is that when reply_timeout happenes and op->recoverable is set, return JK_HTTP_SERVER_BUSY as a http error code with status JK_REPLY_TIMEOUT instead of JK_HTTP_GATEWAY_TIMEOUT in order to make lb_worker handle the fallback behavior. thanks in advance.
Created attachment 20721 [details] proposal patch.
Thank you for analyzing this problem. Yes, reply timeouts should allow retries/failover, at least unles recovery_options disable them. The interface between the service() method of an lb member and the lb itself consists of the service() return code and the additional is_error, which is meant to indicate the HTTP return code. The lb needs to decide, if it should do a failover, and if the member needs to be put into error state. The interface is not really rich enough to help with these decisions. Either we end up in using more fine grained return codes from service(), or we add recoverability(=failover) and member error info as side effects, additionally to is_error. I'm actively investigating this. As a first step, I added some code comments, which return codes to expect from the service() methods. Please stay tuned.