XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.3.1, 3.4.0
    • rbf
    • HDFS 3.3.0, Java 11

    Description

      Problem:
      When active NameNode is restarted and loading fsimage, DFSRouters significantly slow down.

      Investigation:
      When active NameNode is restarted and loading fsimage, RouterRpcClient receives SocketException. Since RouterRpcClient#isUnavailableException(IOException) returns false when the argument is SocketException, the MembershipNameNodeResolver#cacheNS is not refreshed. That's why the order of the NameNodes returned by MemberShipNameNodeResolver#getNamenodesForNameserviceId(String) is unchanged and the active NameNode is still returned first. Therefore RouterRpcClient still tries to connect to the NameNode that is loading fsimage.

      After loading the fsimage, the NameNode throws StandbyException. The exception is one of the 'Unavailable Exception' and the cacheNS is refreshed.

      Workaround:
      Stop NameNode and wait 1 minute before starting NameNode instead of restarting.

      Attachments

        Issue Links

          Activity

            People

              aajisaka Akira Ajisaka
              aajisaka Akira Ajisaka
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2.5h
                  2.5h