Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-6236

SCM receives reports of unknown containers

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • SCM
    • None

    Description

      We have noticed the following log messages in SCM leader and followers for multiple containers:

      2022-01-19 12:53:24,021 ERROR org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Received container report for an unknown container 1368 from datanode \{ ... }
      
      org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: ID #1368
              at org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.lambda$getContainer$0(ContainerManagerImpl.java:147)
              at java.base/java.util.Optional.orElseThrow(Optional.java:408)
              at org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.getContainer(ContainerManagerImpl.java:147)
              at org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:94)
              at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:165)
              at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:133)
              at org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:48)
              at org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
              at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
              at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
              at java.base/java.lang.Thread.run(Thread.java:834)
      
      

      The cluster is currently running SCM HA, but the issue was observed when it was a non-HA cluster as well. This seems to only affect empty containers, since no data appears to be missing. Containers are supposed to exist in SCM DB even after they have been deleted from the datanode, so there seems to be some kind of bug in the container persistence logic.

      Attachments

        Issue Links

          Activity

            People

              swamirishi Swaminathan Balachandran
              erose Ethan Rose
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: