Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4302

NM goes down if error encountered during log aggregation

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.23.0, 2.0.0-alpha, trunk
    • Fix Version/s: 0.23.3, 2.0.2-alpha
    • Component/s: nodemanager
    • Labels:
      None

      Description

      When a container launch request is sent to the NM, if any exception occurs during the init of log aggregation then the NM goes down. The problem can be induced by situations including, but certainly not limited to: transient rpc connection issues, missing tokens, expired tokens, permissions, full/quota exceeded dfs, etc. The problem may occur with and without security enabled.

      The ramification is an entire cluster can be rather easily brought down either maliciously, accidentally, or via a submission bug.

        Activity

          People

          • Assignee:
            Daryn Sharp
            Reporter:
            Daryn Sharp
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development