Hadoop Common
  1. Hadoop Common
  2. HADOOP-3758

Excessive exceptions in HDFS namenode log file

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.17.1
    • Fix Version/s: 0.17.2
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I upgraded a big cluster, out of which 10 nodes did not get upgraded.
      The namenode log showed excessive exceptions, causing the namenode log to ate the entire partition space, in this case close to 700GB log file was generated on the namenode.

      1. HADOOP-3758-trunk.patch
        2 kB
        Lohit Vijayarenu
      2. HADOOP-3758-17.patch
        2 kB
        Lohit Vijayarenu
      3. HADOOP-3758-trunk.patch
        2 kB
        Lohit Vijayarenu
      4. HADOOP-3758-18.patch
        2 kB
        Lohit Vijayarenu

        Issue Links

          Activity

          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #581 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/ )
          Hide
          Chris Douglas added a comment -

          I just committed this. Thanks, Lohit

          Show
          Chris Douglas added a comment - I just committed this. Thanks, Lohit
          Hide
          Raghu Angadi added a comment -

          +1. Looks good.

          Show
          Raghu Angadi added a comment - +1. Looks good.
          Hide
          Lohit Vijayarenu added a comment -

          patch for 0.17.2

          Show
          Lohit Vijayarenu added a comment - patch for 0.17.2
          Hide
          Lohit Vijayarenu added a comment -

          While patch is waiting on hudson, I ran the tests on my LINUX box. All tests pass, even test-patch. There is not testcase, did manual testing.

          Show
          Lohit Vijayarenu added a comment - While patch is waiting on hudson, I ran the tests on my LINUX box. All tests pass, even test-patch. There is not testcase, did manual testing.
          Hide
          Lohit Vijayarenu added a comment - - edited

          Patch for trunk. I tested this by changing layout version and trying to start a datanode connecting to namenode. It fails with IncorrectVersionException.

          Show
          Lohit Vijayarenu added a comment - - edited Patch for trunk. I tested this by changing layout version and trying to start a datanode connecting to namenode. It fails with IncorrectVersionException.
          Hide
          Lohit Vijayarenu added a comment -

          Patch for 0.18

          Show
          Lohit Vijayarenu added a comment - Patch for 0.18
          Hide
          Raghu Angadi added a comment -

          In addition, DN should update lastHeartBeat time even if sendHeartbeat() results in an exception.. this will avoid similar problems with future errors.

          Show
          Raghu Angadi added a comment - In addition, DN should update lastHeartBeat time even if sendHeartbeat() results in an exception.. this will avoid similar problems with future errors.
          Hide
          Raghu Angadi added a comment -

          Thats pretty painful. We should include "IncorrectVersionException" as one of the fatal exceptions at the datanode.

          See DataNode.java:offserService() :

                } catch(RemoteException re) {
                  String reClass = re.getClassName();
                  if (UnregisteredDatanodeException.class.getName().equals(reClass) ||
                      DisallowedDatanodeException.class.getName().equals(reClass)) {
                    LOG.warn("DataNode is shutting down: " + 
                             StringUtils.stringifyException(re));
                    shutdown();
                    return;
                  }
          
          Show
          Raghu Angadi added a comment - Thats pretty painful. We should include "IncorrectVersionException" as one of the fatal exceptions at the datanode. See DataNode.java:offserService() : } catch(RemoteException re) { String reClass = re.getClassName(); if (UnregisteredDatanodeException.class.getName().equals(reClass) || DisallowedDatanodeException.class.getName().equals(reClass)) { LOG.warn("DataNode is shutting down: " + StringUtils.stringifyException(re)); shutdown(); return; }
          Hide
          Jim Huang added a comment - - edited

          In the span of 11 seconds, there were 103,367 exceptions generated for only 10 unique nodes that had incorrect versions.
          Here the repetitive log entries that ate up all the disk space.

          2008-06-26 22:48:18,952 INFO org.apache.hadoop.ipc.Server: IPC Server handler 19 on 8020, call sendHeartbeat(A.B.C.D:50010, 2971509878784, 1731424256, 2150426691690, 0, 0) from A.B.C.D:43226: error: org.apache.hadoop.dfs.IncorrectVersionException: Unexpected version of data node. Reported: -11. Expecting = -13.
          org.apache.hadoop.dfs.IncorrectVersionException: Unexpected version of data node. Reported: -11. Expecting = -13.
                  at org.apache.hadoop.dfs.NameNode.verifyVersion(NameNode.java:682)
                  at org.apache.hadoop.dfs.NameNode.verifyRequest(NameNode.java:669)
                  at org.apache.hadoop.dfs.NameNode.sendHeartbeat(NameNode.java:557)
                  at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
                  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                  at java.lang.reflect.Method.invoke(Method.java:597)
                  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
                  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
          2008-06-26 22:48:18,953 INFO org.apache.hadoop.ipc.Server: IPC Server handler 27 on 8020, call sendHeartbeat(A.B.C.E:50010, 2971509878784, 1993637888, 2151120783483, 0, 0) from A.B.C.E:56503: error: org.apache.hadoop.dfs.IncorrectVersionException: Unexpected version of data node. Reported: -11. Expecting = -13.
          org.apache.hadoop.dfs.IncorrectVersionException: Unexpected version of data node. Reported: -11. Expecting = -13.
                  at org.apache.hadoop.dfs.NameNode.verifyVersion(NameNode.java:682)
                  at org.apache.hadoop.dfs.NameNode.verifyRequest(NameNode.java:669)
                  at org.apache.hadoop.dfs.NameNode.sendHeartbeat(NameNode.java:557)
                  at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
                  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                  at java.lang.reflect.Method.invoke(Method.java:597)
                  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
                  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
          
          Show
          Jim Huang added a comment - - edited In the span of 11 seconds, there were 103,367 exceptions generated for only 10 unique nodes that had incorrect versions. Here the repetitive log entries that ate up all the disk space. 2008-06-26 22:48:18,952 INFO org.apache.hadoop.ipc.Server: IPC Server handler 19 on 8020, call sendHeartbeat(A.B.C.D:50010, 2971509878784, 1731424256, 2150426691690, 0, 0) from A.B.C.D:43226: error: org.apache.hadoop.dfs.IncorrectVersionException: Unexpected version of data node. Reported: -11. Expecting = -13. org.apache.hadoop.dfs.IncorrectVersionException: Unexpected version of data node. Reported: -11. Expecting = -13. at org.apache.hadoop.dfs.NameNode.verifyVersion(NameNode.java:682) at org.apache.hadoop.dfs.NameNode.verifyRequest(NameNode.java:669) at org.apache.hadoop.dfs.NameNode.sendHeartbeat(NameNode.java:557) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896) 2008-06-26 22:48:18,953 INFO org.apache.hadoop.ipc.Server: IPC Server handler 27 on 8020, call sendHeartbeat(A.B.C.E:50010, 2971509878784, 1993637888, 2151120783483, 0, 0) from A.B.C.E:56503: error: org.apache.hadoop.dfs.IncorrectVersionException: Unexpected version of data node. Reported: -11. Expecting = -13. org.apache.hadoop.dfs.IncorrectVersionException: Unexpected version of data node. Reported: -11. Expecting = -13. at org.apache.hadoop.dfs.NameNode.verifyVersion(NameNode.java:682) at org.apache.hadoop.dfs.NameNode.verifyRequest(NameNode.java:669) at org.apache.hadoop.dfs.NameNode.sendHeartbeat(NameNode.java:557) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)

            People

            • Assignee:
              Lohit Vijayarenu
              Reporter:
              Jim Huang
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development