Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-421

Resilient DNS resolution in datanode-service

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.2.1
    • Component/s: Ozone Datanode
    • Labels:
      None

      Description

      When I start big clusters on kubernetes I got a very typical error:

      If the DNS of the scm is not yet available during the bootup of the datanode: the datanode won't connect to the scm. It tries to reconnect but the dns resolution is not repeated.

      The problem is in the InitDatanodeState.call(). It calls the getSCMAddresses which creates the InetSocketAddress-es with using the hadoop utilities. During the creation of the InetSocketAddress the hadoop utilities try to resolve the address and save the result to the InetSocketAddress.

      The address could be unresolved, but the InitDatanodeState.call will start to use it (connectionManager.addSCMServer) and there won't be any attempt to resolve it later.

      My small proposal is to return immediately of any of the scm addresses is unresolved and the main loop of the DatanodeStateMachine will try it again (together with the DNS resolution part).

        Attachments

          Activity

            People

            • Assignee:
              elek Marton Elek
              Reporter:
              elek Marton Elek
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: