Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4302

NM goes down if error encountered during log aggregation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.23.0, 2.0.0-alpha
    • 0.23.3, 2.0.2-alpha
    • nodemanager
    • None

    Description

      When a container launch request is sent to the NM, if any exception occurs during the init of log aggregation then the NM goes down. The problem can be induced by situations including, but certainly not limited to: transient rpc connection issues, missing tokens, expired tokens, permissions, full/quota exceeded dfs, etc. The problem may occur with and without security enabled.

      The ramification is an entire cluster can be rather easily brought down either maliciously, accidentally, or via a submission bug.

      Attachments

        1. MAPREDUCE-4302.patch
          27 kB
          Daryn Sharp

        Activity

          People

            daryn Daryn Sharp
            daryn Daryn Sharp
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: