Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16043

HDFS tests - "Command processor" thread leak

    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 9.0
    • hdfs, Tests
    • None

    Description

      This is caused by SOLR-15942. Some background:

      It looks like it might be trying to shut down:

       2> 31835 ERROR (Command processor) [] o.a.h.h.s.d.DataNode Command processor encountered interrupt and exit.
        2> 31835 WARN  (BP-1491942626-127.0.0.1-1645493532707 heartbeating to localhost/127.0.0.1:51572) [] o.a.h.h.s.d.IncrementalBlockReportManager IncrementalBlockReportManager interrupted
        2> 31835 WARN  (Command processor) [] o.a.h.h.s.d.DataNode Ending command processor service for: Thread[Command processor,5,TGRP-HdfsBackupRepositoryIntegrationTest]
      

      the error message looks like (this is from https://jenkins.thetaphi.de/job/Solr-main-MacOSX/678/consoleText which is attached Solr-main-MACOSX-678_consoleText.txt.gz ):

         2> 31948 WARN  (Listener at localhost/51675) [] o.a.h.h.s.d.f.i.FsDatasetAsyncDiskService AsyncDiskService has already shut down.
        2> 31948 WARN  (Listener at localhost/51675) [] o.a.h.h.s.d.f.i.RamDiskAsyncLazyPersistService AsyncLazyPersistService has already shut down.
        2> 31948 INFO  (Listener at localhost/51675) [] o.a.h.h.s.d.DataNode Shutdown complete.
        2> 32290 INFO  (Listener at localhost/51675) [] o.a.s.u.ErrorLogMuter Closing ErrorLogMuter-regex-1 after mutting 0 log messages
        2> 32290 INFO  (Listener at localhost/51675) [] o.a.s.u.ErrorLogMuter Creating ErrorLogMuter-regex-2 for ERROR logs matching regex: ignore_exception
        2> Feb 22, 2022 1:32:34 AM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks
        2> WARNING: Will linger awaiting termination of 1 leaked thread(s).
        2> Feb 22, 2022 1:32:35 AM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks
        2> SEVERE: 1 thread leaked from SUITE scope at org.apache.solr.core.backup.repository.HdfsBackupRepositoryIntegrationTest: 
        2>    1) Thread[id=90, name=Command processor, state=WAITING, group=TGRP-HdfsBackupRepositoryIntegrationTest]
        2>         at java.base@16.0.2/jdk.internal.misc.Unsafe.park(Native Method)
        2>         at java.base@16.0.2/java.util.concurrent.locks.LockSupport.park(LockSupport.java:341)
        2>         at java.base@16.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(AbstractQueuedSynchronizer.java:505)
        2>         at java.base@16.0.2/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3137)
        2>         at java.base@16.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1614)
        2>         at java.base@16.0.2/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435)
        2>         at app//org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1331)
        2>         at app//org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1315)
        2> Feb 22, 2022 1:32:35 AM com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll
        2> INFO: Starting to interrupt leaked threads:
        2>    1) Thread[id=90, name=Command processor, state=WAITING, group=TGRP-HdfsBackupRepositoryIntegrationTest]
        2> 33504 ERROR (Command processor) [] o.a.h.h.s.d.DataNode Command processor encountered interrupt and exit.
        2> 33504 WARN  (Command processor) [] o.a.h.h.s.d.DataNode Ending command processor service for: Thread[Command processor,5,TGRP-HdfsBackupRepositoryIntegrationTest]
        2> Feb 22, 2022 1:32:35 AM com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll
        2> INFO: All leaked threads terminated.
         >     com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.core.backup.repository.HdfsBackupRepositoryIntegrationTest: 
         >        1) Thread[id=90, name=Command processor, state=WAITING, group=TGRP-HdfsBackupRepositoryIntegrationTest]
         >             at java.base@16.0.2/jdk.internal.misc.Unsafe.park(Native Method)
         >             at java.base@16.0.2/java.util.concurrent.locks.LockSupport.park(LockSupport.java:341)
         >             at java.base@16.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(AbstractQueuedSynchronizer.java:505)
         >             at java.base@16.0.2/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3137)
         >             at java.base@16.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1614)
         >             at java.base@16.0.2/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435)
         >             at app//org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1331)
         >             at app//org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1315)
         >         at __randomizedtesting.SeedInfo.seed([1A5FAFA6CE7F7FF0]:0)
        2> NOTE: test params are: codec=Asserting(Lucene90): {}, docValues:{}, maxPointsInLeafNode=757, maxMBSortInHeap=5.99773708873607, sim=Asserting(RandomSimilarity(queryNorm=false): {}), locale=sah-RU, timezone=Asia/Macau
        2> NOTE: Mac OS X 10.14.6 x86_64/Eclipse Foundation 16.0.2 (64-bit)/cpus=6,threads=40,free=159527368,total=271581184
        2> NOTE: All tests run in this JVM: [HdfsBackupRepositoryIntegrationTest]
        2> NOTE: reproduce with: gradlew test --tests HdfsBackupRepositoryIntegrationTest -Dtests.seed=1A5FAFA6CE7F7FF0 -Dtests.slow=true -Dtests.locale=sah-RU -Dtests.timezone=Asia/Macau -Dtests.asserts=true -Dtests.file.encoding=UTF-8
      

      Some related code pointers:

      PS: there is an error about datanode mbean - but it is unrelated as far as I can tell - https://issues.apache.org/jira/browse/HDFS-11041

      Attachments

        Issue Links

          Activity

            People

              krisden Kevin Risden
              krisden Kevin Risden
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m