Uploaded image for project: 'REEF (Retired)'
  1. REEF (Retired)
  2. REEF-817

Group comm hangs when root task is added after child tasks start running

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.13
    • 0.14
    • REEF-IO
    • None

    Description

      The Java-side group communication service makes an implicit assumption that the root task (aka controller task or master task) must be added to the topology before child tasks start running, which is not always true. For example, the evaluator that the root task should spawn on may be delayed due to mechanical issues. Topology formation is started after the root task has been added, and thus child tasks that start up early never get to know what its parent or children are even if the root task gets added later. This usually leads to a job timeout. This bug can be reproduced by purposely calling the CommunicationGroupDriver.addTask(rootTaskConf) late, using a simple Thread.sleep().

      Attachments

        Issue Links

          Activity

            People

              jsjason Joo Seong Jeong
              jsjason Joo Seong Jeong
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: