Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      We have GPU resource discovered when the NM bootstrap but not updated through later heatbeat with RM. There should be a monitoring mechanism to check GPU healthy status from time to time and also the corresponding handling.

      And YARN-8851 will also handle device's monitoring. There could be some common part between the two.

      Attachments

        Activity

          People

            tangzhankun Zhankun Tang
            tangzhankun Zhankun Tang
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: