Uploaded image for project: 'Giraph'
  1. Giraph
  2. GIRAPH-1077

Jobs getting stuck after channel failure

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2.0
    • Component/s: None
    • Labels:
      None

      Description

      When a channel fails currently we just log the failure. Since we don't wait on open requests from every place, checking requests doesn't get called always, and we've seen issues with jobs staying stuck, for example during the input stage when request for split to read from worker to master fails. When we know that channel failed, we should try to resend the requests from that channel.

        Attachments

          Activity

            People

            • Assignee:
              majakabiljo Maja Kabiljo
              Reporter:
              majakabiljo Maja Kabiljo
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: