Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-12029

Data node process crashes after kernel upgrade

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Not A Bug
    • None
    • None
    • datanode
    • None

    Description

      We have seen that when Linux kernel is upgraded to address a specific CVE
      ( https://access.redhat.com/security/vulnerabilities/stackguard ) it might cause a datanode crash.

      We have observed this issue while upgrading from 3.10.0-514.6.2 to 3.10.0-514.21.2 versions of the kernel.

      Original kernel fix is here – https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1be7107fbe18eed3e319a6c3e83c78254b693acb

      Datanode fails with the following stack trace,

      # 
      # A fatal error has been detected by the Java Runtime Environment: 
      # 
      # SIGBUS (0x7) at pc=0x00007f458d078b7c, pid=13214, tid=139936990349120 
      # 
      # JRE version: (8.0_40-b25) (build ) 
      # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.40-b25 mixed mode linux-amd64 compressed oops) 
      # Problematic frame: 
      # j java.lang.Object.<clinit>()V+0 
      # 
      # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again 
      # 
      # An error report file with more information is saved as: 
      # /tmp/hs_err_pid13214.log 
      # 
      # If you would like to submit a bug report, please visit: 
      # http://bugreport.java.com/bugreport/crash.jsp 
      # 
      

      The root cause is a failure in jsvc. If we pass a greater than 1MB value as the stack size argument, this can be mitigated. Something like:

      exec "$JSVC" \
      -Xss2m
      org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter "$@"
      

      This JIRA tracks potential fixes for this problem. We don't have data on how this impacts other applications that run on datanode as this might impact datanodes memory usage.

      Attachments

        Issue Links

          Activity

            People

              nanda Nandakumar
              aengineer Anu Engineer
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: