Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-21614

Restart NFSGateway fails after ResourceManager move to another host

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.5.2
    • None
    • None

    Description

      Test performed:

      1. Move ResourceManager to a different host
      2. Regenerate Keytabs
      3. Restart required services

      In build #180, while performing Restart of required services, Restart of
      NFSGateway fails with the following error for *Administrator* and **Cluster
      Administrator** roles:

      2017-07-26 04:47:17,828 INFO nfs3.Nfs3Base (Nfs3Base.java:<init>(45)) - NFS server port set to: 2049
      2017-07-26 04:47:17,831 INFO oncrpc.RpcProgram (RpcProgram.java:<init>(99)) - Will accept client connections from unprivileged ports
      2017-07-26 04:47:17,839 INFO security.UserGroupInformation (UserGroupInformation.java:loginUserFromKeytab(1101)) - Login successful for user nfs/ctr-e134-1499953498516-54517-01-000003.hwx.site@EXAMPLE.COM using keytab file /etc/security/keytabs/nfs.service.keytab
      2017-07-26 04:47:18,785 INFO oncrpc.SimpleUdpServer (SimpleUdpServer.java:run(73)) - Started listening to UDP requests at port 4242 for Rpc program: mountd at localhost:4242 with workerCount 1
      2017-07-26 04:47:18,805 FATAL mount.MountdBase (MountdBase.java:startTCPServer(85)) - Failed to start the TCP server.
      org.jboss.netty.channel.ChannelException: Failed to bind to: 0.0.0.0/0.0.0.0:4242
      at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
      at org.apache.hadoop.oncrpc.SimpleTcpServer.run(SimpleTcpServer.java:88)
      at org.apache.hadoop.mount.MountdBase.startTCPServer(MountdBase.java:83)
      at org.apache.hadoop.mount.MountdBase.start(MountdBase.java:98)
      at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.startServiceInternal(Nfs3.java:56)
      at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.startService(Nfs3.java:69)
      at org.apache.hadoop.hdfs.nfs.nfs3.PrivilegedNfsGatewayStarter.start(PrivilegedNfsGatewayStarter.java:71)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:498)
      at org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
      Caused by: java.net.BindException: Address already in use
      at sun.nio.ch.Net.bind0(Native Method)
      at sun.nio.ch.Net.bind(Net.java:433)
      at sun.nio.ch.Net.bind(Net.java:425)
      at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
      at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
      at org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
      at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
      at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
      at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      2017-07-26 04:47:18,828 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
      2017-07-26 04:47:18,831 INFO nfs3.Nfs3Base (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
      /************************************************************
      SHUTDOWN_MSG: Shutting down Nfs3 at ctr-e134-1499953498516-54517-01-000003.hwx.site/172.27.10.140
      ************************************************************/
      ==> /grid/0/log/hdfs/root/SecurityAuth.audit <==
      ==> /grid/0/log/hdfs/root/hadoop-cstm-hdfs-nfs3-ctr-e134-1499953498516-54517-01-000003.hwx.site.out.4 <==
      ulimit -a for privileged nfs user cstm-hdfs
      core file size (blocks, -c) unlimited
      data seg size (kbytes, -d) unlimited
      scheduling priority (-e) 0
      file size (blocks, -f) unlimited
      pending signals (-i) 1030387
      max locked memory (kbytes, -l) unlimited
      max memory size (kbytes, -m) unlimited
      open files (-n) 65536
      pipe size (512 bytes, -p) 8
      POSIX message queues (bytes, -q) 819200
      real-time priority (-r) 0
      stack size (kbytes, -s) 8192
      cpu time (seconds, -t) unlimited
      max user processes (-u) unlimited
      virtual memory (kbytes, -v) unlimited
      file locks (-x) unlimited
      ==> /grid/0/log/hdfs/root/hadoop-cstm-hdfs-nfs3-ctr-e134-1499953498516-54517-01-000003.hwx.site.out.3 <==
      ulimit -a for privileged nfs user cstm-hdfs
      core file size (blocks, -c) unlimited
      data seg size (kbytes, -d) unlimited
      scheduling priority (-e) 0
      file size (blocks, -f) unlimited
      pending signals (-i) 1030387
      max locked memory (kbytes, -l) unlimited
      max memory size (kbytes, -m) unlimited
      open files (-n) 65536
      pipe size (512 bytes, -p) 8
      POSIX message queues (bytes, -q) 819200
      real-time priority (-r) 0
      stack size (kbytes, -s) 8192
      cpu time (seconds, -t) unlimited
      max user processes (-u) unlimited
      virtual memory (kbytes, -v) unlimited
      file locks (-x) unlimited
      ==> /grid/0/log/hdfs/root/hadoop-cstm-hdfs-nfs3-ctr-e134-1499953498516-54517-01-000003.hwx.site.out.2 <==
      ulimit -a for privileged nfs user cstm-hdfs
      core file size (blocks, -c) unlimited
      data seg size (kbytes, -d) unlimited
      scheduling priority (-e) 0
      file size (blocks, -f) unlimited
      pending signals (-i) 1030387
      max locked memory (kbytes, -l) unlimited
      max memory size (kbytes, -m) unlimited
      open files (-n) 65536
      pipe size (512 bytes, -p) 8
      POSIX message queues (bytes, -q) 819200
      real-time priority (-r) 0
      stack size (kbytes, -s) 8192
      cpu time (seconds, -t) unlimited
      max user processes (-u) unlimited
      virtual memory (kbytes, -v) unlimited
      file locks (-x) unlimited
      ==> /grid/0/log/hdfs/root/hadoop-cstm-hdfs-nfs3-ctr-e134-1499953498516-54517-01-000003.hwx.site.out.1 <==
      ulimit -a for privileged nfs user cstm-hdfs
      core file size (blocks, -c) unlimited
      data seg size (kbytes, -d) unlimited
      scheduling priority (-e) 0
      file size (blocks, -f) unlimited
      pending signals (-i) 1030387
      max locked memory (kbytes, -l) unlimited
      max memory size (kbytes, -m) unlimited
      open files (-n) 65536
      pipe size (512 bytes, -p) 8
      POSIX message queues (bytes, -q) 819200
      real-time priority (-r) 0
      stack size (kbytes, -s) 8192
      cpu time (seconds, -t) unlimited
      max user processes (-u) unlimited
      virtual memory (kbytes, -v) unlimited
      file locks (-x) unlimited
      ==> /grid/0/log/hdfs/root/hadoop-cstm-hdfs-nfs3-ctr-e134-1499953498516-54517-01-000003.hwx.site.out <==
      ulimit -a for privileged nfs user cstm-hdfs
      core file size (blocks, -c) unlimited
      data seg size (kbytes, -d) unlimited
      scheduling priority (-e) 0
      file size (blocks, -f) unlimited
      pending signals (-i) 1030387
      max locked memory (kbytes, -l) unlimited
      max memory size (kbytes, -m) unlimited
      open files (-n) 65536
      pipe size (512 bytes, -p) 8
      POSIX message queues (bytes, -q) 819200
      real-time priority (-r) 0
      stack size (kbytes, -s) 8192
      cpu time (seconds, -t) unlimited
      max user processes (-u) unlimited
      virtual memory (kbytes, -v) unlimited
      file locks (-x) unlimited

      Command failed after 1 tries

      Live cluster env: <https://172.27.18.145:8443> extended life for 48 hours

      172.27.18.145 ctr-e134-1499953498516-54516-01-000007.hwx.site ctr-e134-1499953498516-54516-01-000007
      172.27.16.83 ctr-e134-1499953498516-54516-01-000006.hwx.site ctr-e134-1499953498516-54516-01-000006
      172.27.53.131 ctr-e134-1499953498516-54516-01-000005.hwx.site ctr-e134-1499953498516-54516-01-000005
      172.27.54.24 ctr-e134-1499953498516-54516-01-000004.hwx.site ctr-e134-1499953498516-54516-01-000004
      172.27.20.195 ctr-e134-1499953498516-54516-01-000002.hwx.site ctr-e134-1499953498516-54516-01-000002

      Attachments

        1. AMBARI-21614.patch
          3 kB
          Andrew Onischuk

        Issue Links

          Activity

            People

              aonishuk Andrew Onischuk
              aonishuk Andrew Onischuk
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: