Hadoop Common
  1. Hadoop Common
  2. HADOOP-6284

Any hadoop commands crashing jvm (SIGBUS) when /tmp (tmpfs) is full

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: scripts
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Add a new parameter, HADOOP_JAVA_PLATFORM_OPTS, to hadoop-config.sh so that it allows setting java command options for JAVA_PLATFORM.

      Description

      [knoguchi@ ~]$ df /tmp
      Filesystem           1K-blocks      Used Available Use% Mounted on
      tmpfs                   524288    524288         0 100% /tmp
      [knoguchi@ ~]$ hadoop dfs -ls 
      #
      # An unexpected error has been detected by Java Runtime Environment:
      #
      #  SIGBUS (0x7) at pc=0x00824077, pid=19185, tid=4160617360
      #
      # Java VM: Java HotSpot(TM) Server VM (10.0-b22 mixed mode linux-x86)
      # Problematic frame:
      # C  [libc.so.6+0x6e077]  memset+0x37
      #
      # An error report file with more information is saved as:
      # /homes/knoguchi/hs_err_pid19185.log
      #
      # If you would like to submit a bug report, please visit:
      #   http://java.sun.com/webapps/bugreport/crash.jsp
      #
      Aborted
      [knoguchi@ ~]$ 
      

      This does not happen when /tmp is not in tmpfs.

      1. hadoop-6284-patch-v1.txt
        0.7 kB
        Koji Noguchi
      2. HADOOP-6284-y0.20.1.patch
        0.6 kB
        Koji Noguchi

        Issue Links

          Activity

          Hide
          Koji Noguchi added a comment -

          Reproducing this error, it is crashing when trying to create /tmp/hsperf_knoguchi

          [pid 17137] open("/tmp/hsperfdata_knoguchi/17135", O_RDWR|O_CREAT|O_TRUNC, 0600) = 3
          [pid 17137] ftruncate(3, 32768) = 0
          [pid 17137] mmap2(NULL, 32768, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0xf7fb817c) = 0xfffffffff7fec000
          [pid 17137] close(3) = 0
          [pid 17137] — SIGBUS (Bus error) @ 0 (0) —

          Since /tmp is a tmpfs, open itself goes through which is confusing the jvm.

          It would have been nice if we can set different /tmp, but this is hard coded in java.
          http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6447182

          It is suggested that
          "One workaround would be to disable the temporary mapping of hsperfdata file by using "-XX:-UsePerfData". "

          This can almost be done by setting HADOOP_CLIENT_OPTS but we also have this in the hadoop script.

            JAVA_PLATFORM=`CLASSPATH=${CLASSPATH} ${JAVA} -Xmx32m org.apache.hadoop.util.PlatformName | sed -e "s/ /_/g"`
          

          which also fails when /tmp is full.

          Can we have a way to set options for this command or hardcode "-XX:-UsePerfData" in the above line?

          We have couple of incidents where one user fills up /tmp and failing all the hadoop commands from that node.

          Show
          Koji Noguchi added a comment - Reproducing this error, it is crashing when trying to create /tmp/hsperf_knoguchi [pid 17137] open("/tmp/hsperfdata_knoguchi/17135", O_RDWR|O_CREAT|O_TRUNC, 0600) = 3 [pid 17137] ftruncate(3, 32768) = 0 [pid 17137] mmap2(NULL, 32768, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0xf7fb817c) = 0xfffffffff7fec000 [pid 17137] close(3) = 0 [pid 17137] — SIGBUS (Bus error) @ 0 (0) — Since /tmp is a tmpfs, open itself goes through which is confusing the jvm. It would have been nice if we can set different /tmp, but this is hard coded in java. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6447182 It is suggested that "One workaround would be to disable the temporary mapping of hsperfdata file by using "-XX:-UsePerfData". " This can almost be done by setting HADOOP_CLIENT_OPTS but we also have this in the hadoop script. JAVA_PLATFORM=`CLASSPATH=${CLASSPATH} ${JAVA} -Xmx32m org.apache.hadoop.util.PlatformName | sed -e "s/ /_/g"` which also fails when /tmp is full. Can we have a way to set options for this command or hardcode "-XX:-UsePerfData" in the above line? We have couple of incidents where one user fills up /tmp and failing all the hadoop commands from that node.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          How about we add a HADOOP_JVM_OPTS?

          Show
          Tsz Wo Nicholas Sze added a comment - How about we add a HADOOP_JVM_OPTS?
          Hide
          Koji Noguchi added a comment -

          How about we add a HADOOP_JVM_OPTS?

          I only want this set for 'JAVA_PLATFORM=`CLASSPATH... $

          {JAVA}

          ' command.
          (Since UsePerfData looks required for java tools to connect to jvm.)

          HADOOP_JVM_OPTS sounds too general for that.

          Show
          Koji Noguchi added a comment - How about we add a HADOOP_JVM_OPTS? I only want this set for 'JAVA_PLATFORM=`CLASSPATH... $ {JAVA} ' command. (Since UsePerfData looks required for java tools to connect to jvm.) HADOOP_JVM_OPTS sounds too general for that.
          Hide
          Koji Noguchi added a comment -

          It's a silly patch but introduces a new env HADOOP_JAVA_PLATFORM_OPTS .

          With this, no option set.

          [knoguchi@ ~]$ df /tmp
          Filesystem           1K-blocks      Used Available Use% Mounted on
          tmpfs                   524288    524288         0 100% /tmp
          
          [knoguchi@ ~]$ hadoop dfs -ls /
          #
          # An unexpected error has been detected by Java Runtime Environment:
          #
          #  SIGBUS (0x7) at pc=0x00824077, pid=12811, tid=4160617360
          #
          # Java VM: Java HotSpot(TM) Server VM (10.0-b22 mixed mode linux-x86)
          # Problematic frame:
          # C  [libc.so.6+0x6e077]  memset+0x37
          #
          # An error report file with more information is saved as:
          # /homes/knoguchi/hs_err_pid12811.log
          #
          # If you would like to submit a bug report, please visit:
          #   http://java.sun.com/webapps/bugreport/crash.jsp
          #
          Abort
          

          Setting HADOOP_CLIENT_OPTS

          [knoguchi@ ~]$ setenv HADOOP_CLIENT_OPTS '-XX:-UsePerfData'
          [knoguchi@ ~]$ $HADOOP_HOME/bin/hadoop dfs -ls /
          Exception in thread "main" java.lang.NoClassDefFoundError: #_An_unexpected_error_has_been_detected_by_Java_Runtime_Environment:
          Caused by: java.lang.ClassNotFoundException: #_An_unexpected_error_has_been_detected_by_Java_Runtime_Environment:
                  at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
                  at java.security.AccessController.doPrivileged(Native Method)
                  at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
                  at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
                  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
                  at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
                  at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
          
          

          This is because hadoop is executing
          java -Xmx1000m -Djava.library.path=/.../hadoop/bin/../lib/native/# #_An_unexpected_error_has_been_detected_by_Java_Runtime_Environment: #

          (basically JAVA_PLATFORM became a long error message)

          and then

          [knoguchi@ ~]$ setenv HADOOP_JAVA_PLATFORM_OPTS '-XX:-UsePerfData'
          [knoguchi@ ~]$ $HADOOP_HOME/bin/hadoop dfs -ls /   
          Found 10 items
          drwx------   - ...
          

          works.

          I'm reluctant to put -XX:-UsePerfData directly in hadoop script since I don't know when java stops supporting this option.

          Show
          Koji Noguchi added a comment - It's a silly patch but introduces a new env HADOOP_JAVA_PLATFORM_OPTS . With this, no option set. [knoguchi@ ~]$ df /tmp Filesystem 1K-blocks Used Available Use% Mounted on tmpfs 524288 524288 0 100% /tmp [knoguchi@ ~]$ hadoop dfs -ls / # # An unexpected error has been detected by Java Runtime Environment: # # SIGBUS (0x7) at pc=0x00824077, pid=12811, tid=4160617360 # # Java VM: Java HotSpot(TM) Server VM (10.0-b22 mixed mode linux-x86) # Problematic frame: # C [libc.so.6+0x6e077] memset+0x37 # # An error report file with more information is saved as: # /homes/knoguchi/hs_err_pid12811.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # Abort Setting HADOOP_CLIENT_OPTS [knoguchi@ ~]$ setenv HADOOP_CLIENT_OPTS '-XX:-UsePerfData' [knoguchi@ ~]$ $HADOOP_HOME/bin/hadoop dfs -ls / Exception in thread "main" java.lang.NoClassDefFoundError: #_An_unexpected_error_has_been_detected_by_Java_Runtime_Environment: Caused by: java.lang.ClassNotFoundException: #_An_unexpected_error_has_been_detected_by_Java_Runtime_Environment: at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276) at java.lang.ClassLoader.loadClass(ClassLoader.java:251) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) This is because hadoop is executing java -Xmx1000m -Djava.library.path=/.../hadoop/bin/../lib/native/# #_An_unexpected_error_has_been_detected_by_Java_Runtime_Environment: # (basically JAVA_PLATFORM became a long error message) and then [knoguchi@ ~]$ setenv HADOOP_JAVA_PLATFORM_OPTS '-XX:-UsePerfData' [knoguchi@ ~]$ $HADOOP_HOME/bin/hadoop dfs -ls / Found 10 items drwx------ - ... works. I'm reluctant to put -XX:-UsePerfData directly in hadoop script since I don't know when java stops supporting this option.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 patch looks good.

          I will wait for a few days before committing this to see whether anyone has comments.

          Show
          Tsz Wo Nicholas Sze added a comment - +1 patch looks good. I will wait for a few days before committing this to see whether anyone has comments.
          Hide
          Konstantin Boudnik added a comment -

          This sounds kinda strange that using an internal HotSpot flag intended for pre Hotspot 1.4.1 version can make such a difference, but perhaps it is (see Hotspot Monitoring Tools and Utilities section here). I think a bug is needed to be files against Hotspot for this...

          Show
          Konstantin Boudnik added a comment - This sounds kinda strange that using an internal HotSpot flag intended for pre Hotspot 1.4.1 version can make such a difference, but perhaps it is (see Hotspot Monitoring Tools and Utilities section here). I think a bug is needed to be files against Hotspot for this...
          Hide
          Koji Noguchi added a comment -

          I think a bug is needed to be files against Hotspot for this...

          I'll look into it. But in the meantime, this Jira is asking for a way to pass an option for JAVA_PLATFORM.
          (We could have done this for "-Xmx32m" HADOOP-5564 as well.)

          Show
          Koji Noguchi added a comment - I think a bug is needed to be files against Hotspot for this... I'll look into it. But in the meantime, this Jira is asking for a way to pass an option for JAVA_PLATFORM. (We could have done this for "-Xmx32m" HADOOP-5564 as well.)
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I have committed this. Thanks, Koji!

          Show
          Tsz Wo Nicholas Sze added a comment - I have committed this. Thanks, Koji!
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The "Fix Version/s" should be 0.22 but it is currently missing.

          Show
          Tsz Wo Nicholas Sze added a comment - The "Fix Version/s" should be 0.22 but it is currently missing.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #53 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/53/)
          . Add a new parameter, HADOOP_JAVA_PLATFORM_OPTS, to hadoop-config.sh so that it allows setting java command options for JAVA_PLATFORM. Contributed by Koji Noguchi

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #53 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/53/ ) . Add a new parameter, HADOOP_JAVA_PLATFORM_OPTS, to hadoop-config.sh so that it allows setting java command options for JAVA_PLATFORM. Contributed by Koji Noguchi
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk #113 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/113/)
          . Add a new parameter, HADOOP_JAVA_PLATFORM_OPTS, to hadoop-config.sh so that it allows setting java command options for JAVA_PLATFORM. Contributed by Koji Noguchi

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk #113 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/113/ ) . Add a new parameter, HADOOP_JAVA_PLATFORM_OPTS, to hadoop-config.sh so that it allows setting java command options for JAVA_PLATFORM. Contributed by Koji Noguchi
          Hide
          Koji Noguchi added a comment -

          Patch for 0.20. (not meant for commit)

          Show
          Koji Noguchi added a comment - Patch for 0.20. (not meant for commit)
          Hide
          Koji Noguchi added a comment -

          FYI, we deployed the fix with '-XX:-UsePerfData' config change to our clusters, only to find out this option would hang each jvm for 4 seconds when shutting down...
          A single ls call, java_platform + dfsclient, used to take less than 0.1 second, now took 7-8 seconds after the change... We ended up reverting the config and now changing the /tmp configuration.

          Show
          Koji Noguchi added a comment - FYI, we deployed the fix with '-XX:-UsePerfData' config change to our clusters, only to find out this option would hang each jvm for 4 seconds when shutting down... A single ls call, java_platform + dfsclient, used to take less than 0.1 second, now took 7-8 seconds after the change... We ended up reverting the config and now changing the /tmp configuration.
          Hide
          Eli Collins added a comment -

          Koji,
          Do you need this in 0.23 or 2.0? Per HADOOP-8033 it is no longer available.

          Show
          Eli Collins added a comment - Koji, Do you need this in 0.23 or 2.0? Per HADOOP-8033 it is no longer available.

            People

            • Assignee:
              Koji Noguchi
              Reporter:
              Koji Noguchi
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development