Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.2
    • Fix Version/s: 0.23.3, 2.0.2-alpha
    • Component/s: mrv2
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      The JobHistory server's locking is inconsistent and wrong in some cases. This is not super critical because the issues would only show up if a job is being cleaned up or moved from intermediate done to done, at the same time it is being parsed into a CompletedJob. However the locking is slowing down the server in some cases, and is a ticking time bomb that needs to be addressed.

      As part of this too we need to be sure that the Cleaner and Intermediate to Done migration threads handle exceptions properly. Now it appears that the exception is logged, and the thread just shuts down. This means that the history server could still be up and running for weeks and never remove old jobs.

        Attachments

        1. MR-3972.txt
          50 kB
          Robert Joseph Evans
        2. MR-3972.txt
          50 kB
          Robert Joseph Evans
        3. MR-3972.txt
          49 kB
          Robert Joseph Evans
        4. MR-3972.txt
          51 kB
          Robert Joseph Evans
        5. MR-3972.txt
          75 kB
          Robert Joseph Evans
        6. MR-3972.txt
          80 kB
          Robert Joseph Evans

          Issue Links

            Activity

              People

              • Assignee:
                revans2 Robert Joseph Evans
                Reporter:
                revans2 Robert Joseph Evans
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: