Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15919

BlockPoolManager should log stack trace if unable to get Namenode addresses

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.3.1, 3.4.0, 3.2.3
    • datanode
    • None
    • Reviewed

    Description

      If the hdfs config is badly configured, the datanode can fail to start with this stack trace:

      2021-03-24 05:58:27,026 INFO  datanode.DataNode (BlockPoolManager.java:refreshNamenodes(149)) - Refresh request received for nameservices: null
      2021-03-24 05:58:27,033 WARN  datanode.DataNode (BlockPoolManager.java:refreshNamenodes(161)) - Unable to get NameNode addresses.
      ...
      2021-03-24 05:58:27,077 ERROR datanode.DataNode (DataNode.java:secureMain(2883)) - Exception in secureMain
      java.io.IOException: No services to connect, missing NameNode address.
      	at org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.Java:165)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1440)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:500)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2782)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2690)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2732)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2876)
      	at org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:100)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
      

      In this case, the issue was an exception thrown in DFSUtil.getNNServiceRpcAddressesForCluster(...) but there are a couple of scenarios within it which can cause an exception, so its difficult to figure out what is wrong with the config.

      We should simple add the exception onto the existing log message when an error occurs so it is clear what caused it.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sodonnell Stephen O'Donnell Assign to me
            sodonnell Stephen O'Donnell
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment