Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-21059

We have upgraded our ignite instance from 2.7.6 to 2.14. Found long running cache operations

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 2.14
    • None
    • binary, clients
    • None
    • Docs Required, Release Notes Required

    Description

      We have recently upgraded from 2.7.6 to 2.14 due to the issue observed in production environment where cluster would go in hang state due to partition map exchange.

      Please find the below ticket which i created a while back for ignite 2.7.6

      https://issues.apache.org/jira/browse/IGNITE-13298

      So we migrated the apache ignite version to 2.14 and upgrade happened smoothly but on the third day we could see cluster traffic dip again. 

      We have 5 nodes in a cluster where we provide 400 GB of RAM and more than 1 TB SDD.

      PFB for the attached config.[I have added it as attachment for review]

      I have also added the server logs from the same time when issue happened.

      We have set txn timeout as well as socket timeout both at server and client end for our write operations but seems like sometimes cluster goes into hang state and all our get calls are stuck and slowly everything starts to freeze our jms listener threads and every thread reaches a choked up state in sometime.

      Due to which our read services which does not even use txn to retrieve data also starts to choke. Ultimately leading to end user traffic dip.

      We were hoping product upgrade will help but that has not been the case till now. 

       

       

       

       

       

       

      Attachments

        1. cache-config-1.xml
          27 kB
          Vipul Thakur
        2. client-service.zip
          413 kB
          Vipul Thakur
        3. digiapi-eventprocessing-app-zone1-6685b8d7f7-ntw27.log
          22.19 MB
          Vipul Thakur
        4. digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt1
          1.30 MB
          Vipul Thakur
        5. digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt2
          1.31 MB
          Vipul Thakur
        6. digiapi-eventprocessing-app-zone1-696c8c4946-62jbx-jstck.txt3
          1.32 MB
          Vipul Thakur
        7. digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt1
          1.46 MB
          Vipul Thakur
        8. digiapi-eventprocessing-app-zone1-696c8c4946-7d57w-jstck.txt2
          1.45 MB
          Vipul Thakur
        9. ignite_issue_1101.zip
          375 kB
          Vipul Thakur
        10. Ignite_server_logs.zip
          28.65 MB
          Vipul Thakur
        11. ignite-server-nohup.out
          12.64 MB
          Vipul Thakur
        12. ignite-server-nohup-1.out
          12.64 MB
          Vipul Thakur
        13. image.png
          32 kB
          Vipul Thakur
        14. image-2024-01-11-22-28-51-501.png
          281 kB
          Vipul Thakur
        15. long_txn_.png
          946 kB
          Vipul Thakur
        16. nohup_12.out
          7.30 MB
          Vipul Thakur

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            vipul.thakur Vipul Thakur

            Dates

              Created:
              Updated:

              Slack

                Issue deployment