Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15919

BlockPoolManager should log stack trace if unable to get Namenode addresses

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.3.1, 3.4.0, 3.2.3
    • datanode
    • None
    • Reviewed

    Description

      If the hdfs config is badly configured, the datanode can fail to start with this stack trace:

      2021-03-24 05:58:27,026 INFO  datanode.DataNode (BlockPoolManager.java:refreshNamenodes(149)) - Refresh request received for nameservices: null
      2021-03-24 05:58:27,033 WARN  datanode.DataNode (BlockPoolManager.java:refreshNamenodes(161)) - Unable to get NameNode addresses.
      ...
      2021-03-24 05:58:27,077 ERROR datanode.DataNode (DataNode.java:secureMain(2883)) - Exception in secureMain
      java.io.IOException: No services to connect, missing NameNode address.
      	at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.Java:165)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1440)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:500)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2782)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2690)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2732)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2876)
      	at org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:100)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
      

      In this case, the issue was an exception thrown in DFSUtil.getNNServiceRpcAddressesForCluster(...) but there are a couple of scenarios within it which can cause an exception, so its difficult to figure out what is wrong with the config.

      We should simple add the exception onto the existing log message when an error occurs so it is clear what caused it.

      Attachments

        1. HDFS-15919.001.patch
          0.8 kB
          Stephen O'Donnell

        Activity

          People

            sodonnell Stephen O'Donnell
            sodonnell Stephen O'Donnell
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: