Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-2165

Account for coordinator restarts in calls to status

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2
    • Component/s: None
    • Labels:
      None

      Description

      Currently status of a Samza job is determined by a combination of:
      1. Obtaining YARN's status for the job by querying the RM
      2. Obtain the AM/coordinator URL for the job
      3. If (1) is "Running", Query the job's coordinator URL if all containers have started

      YARN may restart the coordinator between (2) and (3) and the old coordinator process may no longer be alive, triggering a ConnectException in (3). This causes the status-call to fail;

      A better alternative to handle these retriable errors is to return a "New" status from the API - so that applications can keep polling.

        Attachments

          Activity

            People

            • Assignee:
              jagadish1989@gmail.com Jagadish
              Reporter:
              jagadish1989@gmail.com Jagadish
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h
                2h