Issue Details (XML | Word | Printable)

Key: INFRA-716
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Unassigned
Reporter: Jeff Turner
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Infrastructure

issues.apache.org segfaulting due to APR-enabled mod_proxy_ajp

Created: 07/Feb/06 05:35 PM   Updated: 25/Sep/06 03:49 AM
Return to search
Component/s: JIRA
Security Level: public (Regular issues)

Time Tracking:
Not Specified

File Attachments:
  Size
Zip Archive Licensed for inclusion in ASF works server.zip 2006-02-17 03:24 AM Remy Maucherat 17 kB
Environment:
issues.apache.org:
Tomcat 5.5.15
libapr-1.so.0.2.2
java version "1.4.2_10"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_10-b03)
Java HotSpot(TM) 64-Bit Server VM (build 1.4.2_10-b03, mixed mode)
Issue Links:
Reference
 


 Description  « Hide
On issues.apache.org we are using Tomcat + mod_proxy_ajp built to use APR. We were getting infrequent (about once a day) errors:

Feb 4, 2006 8:53:35 AM org.apache.tomcat.util.net.AprEndpoint$Poller run
SEVERE: Critical poller failure (APR does not understand this error code), restarting poller

and then yesterday Tomcat crashed with:

Feb 6, 2006 4:01:56 PM org.apache.tomcat.util.net.AprEndpoint$Poller run
SEVERE: Critical poller failure (APR does not understand this error code), restarting poller
Feb 6, 2006 4:01:57 PM org.apache.tomcat.util.net.AprEndpoint$Poller run
SEVERE: Unexpected poller error
java.lang.ArrayIndexOutOfBoundsException: -1
    at org.apache.tomcat.util.net.AprEndpoint$Poller.destroy(AprEndpoint.java:984)
    at org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoint.java:1091)
    at java.lang.Thread.run(Thread.java:534)
#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
# SIGSEGV (0xb) at pc=0x2000000001861280, pid=7220, tid=2305843011479877840
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (1.4.2_10-b03 mixed mode)
# Problematic frame:
# C [libapr-1.so.0+0x31280] apr_pollset_add+0x140
#
# An error report file with more information is saved as hs_err_pid7220.log
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
#

Tomcat's stdout and hs_err.log file are in people.apache.org/~jefft/tomcatcrash/

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Remy Maucherat added a comment - 17/Feb/06 03:24 AM
I have identified a rare situation which can trigger that crash, and which corresponds to the traces from the server logs. I recommend the Tomcat instance to be patched using the attached patch (extract in the Tomcat folder).

Jeff Turner added a comment - 17/Feb/06 07:25 AM
Another crash today. It looks like this may be triggered by the JVM running out of memory. From the original crash logs, JIRA had only 2.7Mb free:

2006-02-06 16:01:56,470 INFO [jira.web.filters.AccessLogFilter] - http://issues.apache.org/jira/secure/IssueNavigator.jspa 6713-3967 2717

and in the latest log, JIRA had 3.2Mb free

Feb 16, 2006 3:24:54 PM org.apache.tomcat.util.net.AprEndpoint$Poller run
SEVERE: Critical poller failure (APR does not understand this error code), restarting poller
2006-02-16 15:25:05,947 INFO [jira.web.filters.AccessLogFilter] - http://issues.apache.org/jira/secure/ReleaseNote.jspa 338+1013 41575
2006-02-16 15:25:05,947 INFO [jira.web.filters.AccessLogFilter] - http://issues.apache.org/jira/secure/ReleaseNote.jspa 338+1013 41575
2006-02-16 15:25:09,171 INFO [jira.web.filters.AccessLogFilter] - http://issues.apache.org/jira/browse/LUCENE 12348-9065 21847
2006-02-16 15:25:09,171 INFO [jira.web.filters.AccessLogFilter] - http://issues.apache.org/jira/browse/LUCENE 12348-9065 21847
Feb 16, 2006 3:25:17 PM org.apache.tomcat.util.net.AprEndpoint$Poller run
SEVERE: Critical poller failure (APR does not understand this error code), restarting poller
Feb 16, 2006 3:25:17 PM org.apache.tomcat.util.net.AprEndpoint$Poller run
SEVERE: Unexpected poller error
java.lang.ArrayIndexOutOfBoundsException: -1
        at org.apache.tomcat.util.net.AprEndpoint$Poller.destroy(AprEndpoint.java:984)
        at org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoint.java:1091)
        at java.lang.Thread.run(Thread.java:534)
#
# An unexpected error has been detected by HotSpot Virtual Machine:
#
# SIGSEGV (0xb) at pc=0x2000000001861280, pid=10297, tid=2305843011480926416
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (1.4.2_10-b03 mixed mode)
# Problematic frame:
# C [libapr-1.so.0+0x31280] apr_pollset_add+0x140
#
# An error report file with more information is saved as hs_err_pid10297.log
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp


I've put these latest logs on people.apache.org in ~jefft/tomcatcrash/segfault_16Feb.tar.gz

Remy Maucherat added a comment - 17/Feb/06 07:51 AM
java.lang.ArrayIndexOutOfBoundsException: -1
        at org.apache.tomcat.util.net.AprEndpoint$Poller.destroy(AprEndpoint.java:984)

It's still the same issue. The patch I attached will resolve this. It then crashes because after the ArrayIndexOutOfBoundsException exception occurs, a destroyed (= deallocated) socket is added to the poller..

The JVM may be running out of memory, and could be an issue, but this isn't directly related. There's still a big oops in the code that needs to be patched.

Noel J. Bergman added a comment - 17/Feb/06 07:51 AM
I have applied Remy's patch, and restarted JIRA.

Remy Maucherat added a comment - 17/Feb/06 08:05 AM
Note that, although the crash willl most likely be fixed by my patch, there's still an issue which may be related to low memory, as ~jefft/tomcatcrash/segfault_16Feb.tar.gz shows very frequent unexpected poller related errors (which BTW is a *great* way to run into my dumb coding error, which might otherwise go unnoticed). We'll have to investigate why this happens, as it will degrade performance.

Remy Maucherat added a comment - 07/Mar/06 05:09 PM
I seems the segfault issues have been fixed by the patch, so this issue could be closed.

Jeff Turner added a comment - 07/Mar/06 05:20 PM
Yes, it seems fixed. Thanks.