Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-3028

Improvements to error handling

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      There's several places where exceptions can reach the Dispatcher - which can cause a restart. Some of these may be valid and need to be evaluated.
      e.g. TaskCommunicatorManager tracks known containers etc. In case of an error - it throws an unchecked exception, which I believe will reach the dispatcher directly. (Something like this happening would indicate a bug in the framework). Should this trigger a restart of the AM - or shutting down with an internal error?

      The TaskSchedulerManager handles exceptions while processing events and dispatches a generic INTERNAL_ERRROR to the DAGAppMaster. This can be augmented with the reason for the error so that diagnostics are displayed correctly (or at least posted to the history service)

      Also, what should be done when an exception does reach the Dispatcher.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sseth Siddharth Seth
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: