Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-2165

Account for coordinator restarts in calls to status

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.2
    • None
    • None

    Description

      Currently status of a Samza job is determined by a combination of:
      1. Obtaining YARN's status for the job by querying the RM
      2. Obtain the AM/coordinator URL for the job
      3. If (1) is "Running", Query the job's coordinator URL if all containers have started

      YARN may restart the coordinator between (2) and (3) and the old coordinator process may no longer be alive, triggering a ConnectException in (3). This causes the status-call to fail;

      A better alternative to handle these retriable errors is to return a "New" status from the API - so that applications can keep polling.

      Attachments

        Activity

          People

            jagadish1989@gmail.com Jagadish
            jagadish1989@gmail.com Jagadish
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h
                2h