Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-5032

DN stopped to load containers on volume after a container load exception

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.2.0
    • None

    Description

      We have met two cases of container loading exceptions, one case is fixed by HDDS-4722 which throws out Runtime Exception, another case is I backuped a container dirctory using name ContainerID-Backup which triggers bad formated container directory name exception.

      The consequence of these two cases are the massive containers lefting on the same volume are not loaded. While DN is started and running healthly, SCM treats all these container replicas as missing and starts to schedule many replica replication tasks.

      This task is to fix the issue. If there is specific container loading exception, LOG it, and go to load next container.

      Case 1:
      2021-03-12 20:46:16,420 [Thread-8] ERROR org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader: Caught a Run time exception during reading container files from Volume /data3/hdds/hdds {}
      java.lang.NumberFormatException: For input string: "1823-raw"
      at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
      at java.lang.Long.parseLong(Long.java:589)
      at java.lang.Long.parseLong(Long.java:631)
      at org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.getContainerID(ContainerUtils.java:242)
      at org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.getContainerFile(ContainerUtils.java:234)
      at org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.readVolume(ContainerReader.java:132)
      at org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.run(ContainerReader.java:91)
      at java.lang.Thread.run(Thread.java:748)

      Case2:
      2021-03-25 10:15:47,502 [Thread-15] ERROR org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader: Caught a Run time exception during reading container files from Volume /data5/hdds/hdds {}
      org.apache.hadoop.metrics2.MetricsException: Metrics source RDBMetrics already exists!
      at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
      at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
      at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
      at org.apache.hadoop.hdds.utils.db.RDBMetrics.create(RDBMetrics.java:47)
      at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:152)
      at org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:191)
      at org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.start(AbstractDatanodeStore.java:128)
      at org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.<init>(AbstractDatanodeStore.java:103)
      at org.apache.hadoop.ozone.container.metadata.DatanodeStoreSchemaOneImpl.<init>(DatanodeStoreSchemaOneImpl.java:40)
      at org.apache.hadoop.ozone.container.keyvalue.helpers.BlockUtils.getUncachedDatanodeStore(BlockUtils.java:68)
      at org.apache.hadoop.ozone.container.keyvalue.helpers.BlockUtils.getUncachedDatanodeStore(BlockUtils.java:93)
      at org.apache.hadoop.ozone.container.keyvalue.helpers.KeyValueContainerUtil.parseKVContainerData(KeyValueContainerUtil.java:195)
      at org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.verifyAndFixupContainerData(ContainerReader.java:181)
      at org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.verifyContainerFile(ContainerReader.java:158)
      at org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.readVolume(ContainerReader.java:136)
      at org.apache.hadoop.ozone.container.ozoneimpl.ContainerReader.run(ContainerReader.java:91)
      at java.lang.Thread.run(Thread.java:748)

      Attachments

        Activity

          People

            Sammi Sammi Chen
            Sammi Sammi Chen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: