Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-12793

Deadlock in the System Pool on Metadata processing

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.8, 2.7.6
    • None
    • None
    • None
    • Docs Required, Release Notes Required

    Description

      I've recently tried to apply Ilya's idea (https://issues.apache.org/jira/browse/IGNITE-12663) of minimizing thread pools and tried to set system pool to 3 in my own tests.
      It caused deadlock on a client node and I think it can happen not only on such small pool values.

      Details are following:
      I'm not using persistence currently (if it matters).
      On the client note I use ignite compute to call a job on every server node (there are 3 server nodes in the tests).

      Then I've found in logs:

      [10:55:21] : [Step 1/1] [2020-03-13 10:55:21,773]
      
      grid-timeout-worker-#8
      
      [WARN] [o.a.i.i.IgniteKernal] - Possible thread pool starvation detected (no task completed in last 30000ms, is system thread pool size large enough?)
       [10:55:21] : [Step 1/1] ^-- System thread pool [active=3, idle=0, qSize=9]
      

      I see in threaddumps that all 3 system pool workers do the same - processing of job responses:

        "sys-#26" #605 daemon prio=5 os_prio=0 tid=0x0000000064a0a800 nid=0x1f34 waiting on condition [0x000000007b91d000]
       java.lang.Thread.State: WAITING (parking)
       at sun.misc.Unsafe.park(Native Method)
       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
       at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
       at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
       at org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.metadata(CacheObjectBinaryProcessorImpl.java:749)
       at org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$1.metadata(CacheObjectBinaryProcessorImpl.java:250)
       at org.apache.ignite.internal.binary.BinaryContext.metadata(BinaryContext.java:1169)
       at org.apache.ignite.internal.binary.BinaryReaderExImpl.getOrCreateSchema(BinaryReaderExImpl.java:2005)
       at org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:285)
       at org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:184)
       at org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797)
       at org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160)
       at org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091)
       at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914)
       at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
       at org.apache.ignite.internal.binary.BinaryReaderExImpl.readField(BinaryReaderExImpl.java:1982)
       at org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read0(BinaryFieldAccessor.java:702)
       at org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:187)
       at org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:887)
       at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1762)
       at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
       at org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797)
       at org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160)
       at org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091)
       at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914)
       at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
       at org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:306)
       at org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:100)
       at org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:80)
       at org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10493)
       at org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:828)
       at org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1134) 
      

      As I found analyzing this stack trace, unmarshalling a user object the first time(per type) causes Binary metadata request (despite I've registered this type in BinaryConfiguration.setTypeConfiguration)

      And all this futures will be completed after consequent MetadataResponseMessage will be received and processed on the client node. But MetadataResponseMessage(GridTopic.TOPIC_METADATA_REQ) is also processing in system pool. (I see that method GridIoManager#processRegularMessage routes it to the System Pool).
      So it causes deadlock as the Sytem Pool is already full.

      Update

      I reproduced this on master on standard system pool size(see comments). Full threaddump is attached.

       

      Attachments

        1. ignite-12793-threaddump.txt
          147 kB
          Sergey Kosarev

        Activity

          People

            sergey-chugunov Sergey Chugunov
            macrergate Sergey Kosarev
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: