Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-5528

IS_EVICT_DISABLED flag is not cleared when cache store throws an exception

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.7
    • 2.1
    • cache
    • None

    Description

      Below is an observation from a live system:
      On a large cluster with occasional topology changes, there is a sporadic hang which manifests itself with "Failed to evict partition message" for one of the caches with enabled cache store. I managed to take a heap dump and found out that on the hanging node there was a single entry with IS_EVICT_DISABLED flag set and no other threads were doing store load operation. Earlier in the logs I saw that the cache store threw a CacheLoaderException due to interrupted connection with a database.

      Currently, the flag is set before the cache store load and it is cleared after the load.
      Looks like if the store throws an exception, this leads to the leaked flag set and the entry cannot be cleared from the partition. As a result, on the next topology change partition exchange will be freezed with "Failed to wait for partition eviction" error message.

      Attached is the test reproducing this issue (note that the message appears after one minute)

      Attachments

        Issue Links

          Activity

            People

              amashenkov Andrey Mashenkov
              agoncharuk Alexey Goncharuk
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: