Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-999

Make the DNS resolution in OzoneManager more resilient

    XMLWordPrintableJSON

Details

    Description

      If the OzoneManager is started before scm the scm dns may not be available. In this case the om should retry and re-resolve the dns, but as of now it throws an exception:

      2019-01-23 17:14:25 ERROR OzoneManager:593 - Failed to start the OzoneManager.
      java.net.SocketException: Call From om-0.om to null:0 failed on socket exception: java.net.SocketException: Unresolved address; For more details see:  http://wiki.apache.org/hadoop/SocketException
          at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
          at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
          at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
          at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
          at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
          at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:798)
          at org.apache.hadoop.ipc.Server.bind(Server.java:566)
          at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:1042)
          at org.apache.hadoop.ipc.Server.<init>(Server.java:2815)
          at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:994)
          at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:421)
          at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:342)
          at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:804)
          at org.apache.hadoop.ozone.om.OzoneManager.startRpcServer(OzoneManager.java:563)
          at org.apache.hadoop.ozone.om.OzoneManager.getRpcServer(OzoneManager.java:927)
          at org.apache.hadoop.ozone.om.OzoneManager.<init>(OzoneManager.java:265)
          at org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:674)
          at org.apache.hadoop.ozone.om.OzoneManager.main(OzoneManager.java:587)
      Caused by: java.net.SocketException: Unresolved address
          at sun.nio.ch.Net.translateToSocketException(Net.java:131)
          at sun.nio.ch.Net.translateException(Net.java:157)
          at sun.nio.ch.Net.translateException(Net.java:163)
          at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:76)
          at org.apache.hadoop.ipc.Server.bind(Server.java:549)
          ... 11 more
      Caused by: java.nio.channels.UnresolvedAddressException
          at sun.nio.ch.Net.checkAddress(Net.java:101)
          at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:218)
          at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
          ... 12 more

      It should be fixed. (See also HDDS-421 which fixed the same problem in datanode side and HDDS-907 which is the workaround while this issue is not resolved).

      Attachments

        1. HDDS-999.01.patch
          6 kB
          Siddharth Wagle

        Issue Links

          Activity

            People

              swagle Siddharth Wagle
              elek Marton Elek
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m