Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Not A Bug
-
None
-
None
-
None
Description
We have seen that when Linux kernel is upgraded to address a specific CVE
( https://access.redhat.com/security/vulnerabilities/stackguard ) it might cause a datanode crash.
We have observed this issue while upgrading from 3.10.0-514.6.2 to 3.10.0-514.21.2 versions of the kernel.
Original kernel fix is here – https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1be7107fbe18eed3e319a6c3e83c78254b693acb
Datanode fails with the following stack trace,
# # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS (0x7) at pc=0x00007f458d078b7c, pid=13214, tid=139936990349120 # # JRE version: (8.0_40-b25) (build ) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.40-b25 mixed mode linux-amd64 compressed oops) # Problematic frame: # j java.lang.Object.<clinit>()V+0 # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /tmp/hs_err_pid13214.log # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp #
The root cause is a failure in jsvc. If we pass a greater than 1MB value as the stack size argument, this can be mitigated. Something like:
exec "$JSVC" \ -Xss2m org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter "$@"
This JIRA tracks potential fixes for this problem. We don't have data on how this impacts other applications that run on datanode as this might impact datanodes memory usage.
Attachments
Issue Links
- is related to
-
DAEMON-364 Latest RHEL kernel update crashes jsvc
- Resolved