Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-22303

Spark history server is stopped (with umask 027 and custom spark log/pid dir)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.6.1
    • None
    • None

    Description

      STR:

      1. Deploy HDP 2.4.3.0-227 on Ambari 2.5.2.0 without Spark
      2. Enable NN HA
      3. Upgrade Ambari to 2.6.0.0
      4. Register install and perform RU to 2.6.3.0-220
      5. Add Spark service
      6. Wait for some time.
      Result: Spark History server is stopped.

      Cluster: 172.27.62.82:8080 - nat-yc-r6-pgos-ambari-hv-r-upg-1 - 48h.

      Artifacts: <http://logserver.eng.hortonworks.com/?prefix=qelogs/nat/70440
      /ambari-hv-r-upg/split-1/nat-yc-r6-pgos-ambari-
      hv-r-upg-1/artifacts/ctr-e134-1499953498516-250582-01-000014.hwx.site/artifacts/screenshots/com.hw.ambari.ui.tests.monitoring.admin_page.rolling_express_upgrade.TestRegisterAndInstallNewStackVersion/test130_AddService/24_10_27_28_ComponentSpark_History_Server_not_started_on_host_ctr_e134_1499953498516_250582_01_0/>

      Spark logs: <http://logserver.eng.hortonworks.com/?prefix=qelogs/nat/70440
      /ambari-hv-r-upg/split-1/nat-yc-r6-pgos-ambari-hv-r-upg-1/var-
      logs/spark/ctr-e134-1499953498516-250582-01-000002.hwx.site/>

      From Spark.out

      The reported blocks 1677 has reached the threshold 0.9900 of total blocks 1677. The number of live datanodes 5 has reached the minimum number 0. In safe mode extension. Safe mode will be turned off automatically in 18 seconds.
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1422)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2693)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2582)
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:409)
      at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
      at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:422)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
      Caused by: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create file/spark-history/.342200c3-6e9e-485c-8f08-d998cd9d92aa. Name node is in safe mode.
      The reported blocks 1677 has reached the threshold 0.9900 of total blocks 1677. The number of live datanodes 5 has reached the minimum number 0. In safe mode extension. Safe mode will be turned off automatically in 18 seconds.
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1418)
      ... 13 more
      , while invoking ClientNamenodeProtocolTranslatorPB.create over ctr-e134-1499953498516-250582-01-000002.hwx.site/172.27.56.143:8020. Retrying after sleeping for 22256ms.

      However, the NN is not in safemode state:

      [root@ctr-e134-1499953498516-250582-01-000002 ~]# hdfs dfsadmin -safemode get
      Safe mode is OFF in ctr-e134-1499953498516-250582-01-000013.hwx.site/172.27.69.83:8020
      Safe mode is OFF in ctr-e134-1499953498516-250582-01-000002.hwx.site/172.27.56.143:8020

      Attachments

        1. AMBARI-22303.patch
          5 kB
          Andrew Onischuk

        Issue Links

          Activity

            People

              aonishuk Andrew Onischuk
              aonishuk Andrew Onischuk
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: