Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-17356

RBF: Add Configuration dfs.federation.router.ns.name Optimization

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • dfs, rbf
    • None

    Description

          When enabling RBF federation in HDFS, when the HDFS server and RBFClient share the same configuration and the HDFS server (NameNode、ZKFC) and RBFClient are on the same node, the following exception occurs, causing NameNode to fail to start; The reason is that the NS of the Router service has been added to the dfs.nameservices list. When NameNode starts, it obtains the NS that the current node belongs to. However, it is found that there are multiple NS that cannot be recognized and cannot pass the verification of existing logic, ultimately resulting in NameNode startup failure. Currently, we can only solve this problem by isolating the hdfs-site.xml of RouterClient and NameNode. However, grouping configuration is not conducive to our unified management of cluster configuration. Therefore, we propose a new solution to solve this problem better.

      // code placeholder
      2023-10-30 15:53:24,613 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
      2023-10-30 15:53:24,672 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: createNameNode []
      2023-10-30 15:53:24,760 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
      2023-10-30 15:53:24,842 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
      2023-10-30 15:53:24,842 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started
      2023-10-30 15:53:24,868 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
      org.apache.hadoop.HadoopIllegalArgumentException: Configuration has multiple addresses that match local node's address. Please configure the system with dfs.nameservice.id and dfs.ha.namenode.id
              at org.apache.hadoop.hdfs.DFSUtil.getSuffixIDs(DFSUtil.java:1257)
              at org.apache.hadoop.hdfs.DFSUtil.getNameServiceId(DFSUtil.java:1158)
              at org.apache.hadoop.hdfs.DFSUtil.getNamenodeNameServiceId(DFSUtil.java:1113)
              at org.apache.hadoop.hdfs.server.namenode.NameNode.getNameServiceId(NameNode.java:1822)
              at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:1005)
              at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:995)
              at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1769)
              at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1834)
      2023-10-30 15:53:24,870 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: org.apache.hadoop.HadoopIllegalArgumentException: Configuration has multiple addresses that match local node's address. Please configure the system with dfs.nameservice.id and dfs.ha.name
      node.id
      2023-10-30 15:53:24,874 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 

       

      hdfs-site.xml

      // code placeholder
      
      <property>
        <name>dfs.nameservices</name>
        <value>mycluster1,mycluster2,ns-fed</value>
      </property><property>
        <name>dfs.ha.namenodes.ns-fed</name>
        <value>r1</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.ns-fed.r1</name>
        <value>node1.com:8888</value>
      </property>
      <property>
        <name>dfs.ha.namenodes.mycluster1</name>
        <value>nn1,nn2</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.mycluster1.nn1</name>
        <value>node1.com:50070</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.mycluster1.nn2</name>
        <value>node2.com:50070</value>
      </property><property>
        <name>dfs.ha.namenodes.mycluster2</name>
        <value>nn1,nn2</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.mycluster2.nn1</name>
        <value>node3.com:50070</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.mycluster2.nn2</name>
        <value>node4.com:50070</value>
      </property><property>
        <name>dfs.client.failover.proxy.provider.ns-fed</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
      </property>
      <property>
        <name>dfs.client.failover.random.order</name>
        <value>true</value>
      </property> 

       

      Solution

      Add dfs.federation.router.ns.name configuration in hdfs-site.xml to mark the Router NS name. and filter out Router NS during NameNode or ZKFC startup to avoid this issue.

      Attachments

        1. image-2024-01-29-18-04-55-391.png
          113 kB
          xiaojunxiang
        2. image-2024-01-29-22-09-43-263.png
          143 kB
          xiaojunxiang
        3. image-2024-01-31-10-56-23-399.png
          92 kB
          xiaojunxiang
        4. screenshot-1.png
          377 kB
          xiaojunxiang
        5. screenshot-2.png
          1.40 MB
          xiaojunxiang
        6. screenshot-3.png
          20 kB
          xiaojunxiang
        7. screenshot-4.png
          113 kB
          xiaojunxiang
        8. screenshot-5.png
          163 kB
          xiaojunxiang

        Activity

          People

            Unassigned Unassigned
            hiwangzhihui wangzhihui
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: