Uploaded image for project: 'REEF (Retired)'
  1. REEF (Retired)
  2. REEF-1399

Node stuck in group communication failure case

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.16
    • None

    Description

      Currently, in the group communication, if one of the task fails, all the other tasks are waiting forever, that could easily cause leak as those tasks are running in separate threads.
      There are two ways to resolve it:
      1. Add time out in the blocking call in GC. After waiting for longer enough and still not able to receive any message, throw Group Communication exception.
      2. Depend on fault tolerant to let driver to send close event to those tasks, when the task is not iterating and hung, after a timeout, enforce the task to close by throwing exception.
      We will do the second in any case. Question is shall we do the first one?

      Attachments

        Issue Links

          Activity

            People

              juliaw Julia Wang
              juliaw Julia Wang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: