Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7240 Scaling HDFS
  3. HDFS-12361

Ozone: SCM failed to start when a container metadata is empty

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • HDFS-7240
    • None
    • ozone, scm
    • None
    • Reviewed

    Description

      When I run tests to create keys via corona, sometimes it left some containers with empty metadata. This might also happen when SCM stopped at some point that metadata was not yet written. When this happens, we got following error and SCM could not be started

      17/08/27 20:10:57 WARN datanode.DataNode: Unexpected exception in block pool Block pool BP-821804790-172.16.165.133-1503887277256 (Datanode Uuid 7ee16a59-9604-406e-a0f8-6f44650a725b) service to ozone1.fyre.ibm.com/172.16.165.133:8111
      java.lang.NullPointerException
      	at org.apache.hadoop.ozone.container.common.helpers.ContainerData.getFromProtBuf(ContainerData.java:66)
      	at org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.readContainerInfo(ContainerManagerImpl.java:210)
      	at org.apache.hadoop.ozone.container.common.impl.ContainerManagerImpl.init(ContainerManagerImpl.java:158)
      	at org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.<init>(OzoneContainer.java:99)
      	at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.<init>(DatanodeStateMachine.java:77)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.bpRegistrationSucceeded(DataNode.java:1592)
      	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.registrationSucceeded(BPOfferService.java:409)
      	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:783)
      	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:286)
      	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
      	at java.lang.Thread.run(Thread.java:745)
      

      We should add a NPE check and mark such containers as inactive without failing the SCM.

      Attachments

        Activity

          People

            cheersyang Weiwei Yang
            cheersyang Weiwei Yang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: