Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-3586

OM HA can be started with 3 isolated LEADER instead of one OM ring

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Target Version/s:

      Description

      Steps to reproduce:

      Imagine that I have 3 different om with the following DNS names:

      ozone-om-0.ozone-om
      ozone-om-1.ozone-om
      ozone-om-2.ozone-om
      

      I configured the three hosts as the following:

        OZONE-SITE.XML_ozone.om.nodes.omservice: om1,om2,om3
        OZONE-SITE.XML_ozone.om.address.omservice.om1: ozone-om-0
        OZONE-SITE.XML_ozone.om.address.omservice.om2: ozone-om-1
        OZONE-SITE.XML_ozone.om.address.omservice.om3: ozone-om-2
        OZONE-SITE.XML_ozone.om.ratis.enable: "true"
      

      But unfortunately the DNS is not reliable. All the hosts can resolve only the LOCAL hostname.

      OMHANodeDetails.java ignores ALL the configuration which are not resolvable:

       if (!addr.isUnresolved()) {
                if (!isPeer && OmUtils.isAddressLocal(addr)) {
                  localRpcAddress = addr;
                  localOMServiceId = serviceId;
                  localOMNodeId = nodeId;
                  localRatisPort = ratisPort;
                  found++;
                } else {
                  // This OMNode belongs to same OM service as the current OMNode.
                  // Add it to peerNodes list.
                  // This OMNode belongs to same OM service as the current OMNode.
                  // Add it to peerNodes list.
                  peerNodesList.add(getHAOMNodeDetails(conf, serviceId,
                      nodeId, addr, ratisPort));
                }
              }
      

      As a result I will have 3 running server but each has 1 one-node Ratis ring (peerNodesList is empty as only the local hostname can be resolved).

      Group ID is the same for all. But they have separated database and they work as separated OM which is VERY dangerous.

      1. Option one: we can accept any unresolved address and retry with connection create if it couldn't be connected

      2. Option two: at least the error handling should be fixed. When I configured 3 om, there supposed to be 3 om.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                hanishakoneru Hanisha Koneru
                Reporter:
                elek Marton Elek
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: