Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-4722

Creating RDBStore fails due to RDBMetrics instance race

    XMLWordPrintableJSON

Details

    Description

      I am using Ozone APIs to create containers, and it occasionally aborts due to a data race in acessing the RBDMetric instance:

      2021-01-09 02:39:36,944 [pool-1-thread-4] INFO keyvalue.KeyValueContainer: Container 318054 is closed with bcsId 0.
      2021-01-09 02:39:36,988 [pool-1-thread-17] ERROR freon.BaseFreonGenerator: Error on executing task 318048
      com.google.common.util.concurrent.UncheckedExecutionException: org.apache.hadoop.metrics2.MetricsException: Metrics source RDBMetrics already exists!
              at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2051)
              at com.google.common.cache.LocalCache.get(LocalCache.java:3951)
              at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
              at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958)
              at org.apache.hadoop.ozone.freon.ContainerGenerator.lambda$writeContainer$1(ContainerGenerator.java:489)
              at com.codahale.metrics.Timer.time(Timer.java:101)
              at org.apache.hadoop.ozone.freon.ContainerGenerator.writeContainer(ContainerGenerator.java:485)
              at org.apache.hadoop.ozone.freon.BaseFreonGenerator.tryNextTask(BaseFreonGenerator.java:189)
              at org.apache.hadoop.ozone.freon.BaseFreonGenerator.taskLoop(BaseFreonGenerator.java:169)
              at org.apache.hadoop.ozone.freon.BaseFreonGenerator.lambda$startTaskRunners$0(BaseFreonGenerator.java:152)
              at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
              at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
              at java.base/java.lang.Thread.run(Thread.java:834)
      Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics source RDBMetrics already exists!
              at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
              at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
              at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
              at org.apache.hadoop.hdds.utils.db.RDBMetrics.create(RDBMetrics.java:47)
              at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:152)
              at org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:191)
              at org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.start(AbstractDatanodeStore.java:128)
              at org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.<init>(AbstractDatanodeStore.java:103)
              at org.apache.hadoop.ozone.container.metadata.DatanodeStoreSchemaTwoImpl.<init>(DatanodeStoreSchemaTwoImpl.java:48)
              at org.apache.hadoop.ozone.container.keyvalue.helpers.KeyValueContainerUtil.createContainerMetaData(KeyValueContainerUtil.java:112)
              at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:133)
              at org.apache.hadoop.ozone.freon.ContainerGenerator.createContainer(ContainerGenerator.java:463)
              at org.apache.hadoop.ozone.freon.ContainerGenerator.access$100(ContainerGenerator.java:109)
              at org.apache.hadoop.ozone.freon.ContainerGenerator$ContainerCreator.load(ContainerGenerator.java:357)
              at org.apache.hadoop.ozone.freon.ContainerGenerator$ContainerCreator.load(ContainerGenerator.java:353)
              at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3529)
              at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2278)
      

      Looking at the code, I believe RDBMetrics#unRegister() should be made synchronized. Otherwise create and close RDBStore objects could lead to race of the RDBMetrics instance object.

      After making RDBMetrics#unRegister() synchronized, the tool no longer aborts due to the race.

      Attachments

        Issue Links

          Activity

            People

              weichiu Wei-Chiu Chuang
              weichiu Wei-Chiu Chuang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: