Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15919

BlockPoolManager should log stack trace if unable to get Namenode addresses

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.4.0
    • Fix Version/s: 3.3.1, 3.4.0, 3.2.3
    • Component/s: datanode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      If the hdfs config is badly configured, the datanode can fail to start with this stack trace:

      2021-03-24 05:58:27,026 INFO  datanode.DataNode (BlockPoolManager.java:refreshNamenodes(149)) - Refresh request received for nameservices: null
      2021-03-24 05:58:27,033 WARN  datanode.DataNode (BlockPoolManager.java:refreshNamenodes(161)) - Unable to get NameNode addresses.
      ...
      2021-03-24 05:58:27,077 ERROR datanode.DataNode (DataNode.java:secureMain(2883)) - Exception in secureMain
      java.io.IOException: No services to connect, missing NameNode address.
      	at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.Java:165)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1440)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:500)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2782)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2690)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2732)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2876)
      	at org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:100)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
      

      In this case, the issue was an exception thrown in DFSUtil.getNNServiceRpcAddressesForCluster(...) but there are a couple of scenarios within it which can cause an exception, so its difficult to figure out what is wrong with the config.

      We should simple add the exception onto the existing log message when an error occurs so it is clear what caused it.

        Attachments

        1. HDFS-15919.001.patch
          0.8 kB
          Stephen O'Donnell

          Activity

            People

            • Assignee:
              sodonnell Stephen O'Donnell
              Reporter:
              sodonnell Stephen O'Donnell
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: