Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3928

App GUI Cluster->Applications UI has confusing job status report

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.23.1
    • Fix Version/s: None
    • Component/s: applicationmaster
    • Labels:
      None

      Description

      The 0.23.1 Application GUI has some potential usability issues and confusion
      points with respect to job status, from a user's perspective.

      Currently, when starting from the App UI base link of
      http://<RM_HOST>:8088/cluster, the main window shows "All Applications", and in
      the upper left corner is the "Cluster" pulldown, opened up showing the
      following list:

      About
      Nodes
      Applications

      • New
      • Submitted
      • Accepted
      • Running
      • Finished
      • Failed
      • Killed
        Scheduler

      When jobs are submitted, they show up in the pre-final states, like Accepted
      and Running. however after completion, no matter on the final outcome of a job
      (SUCCESS, KILLED or FAILED), all jobs are listed under FINISHED, this is okay
      since any completion result qualifies as Finished, but the FAILED and KILLED
      jobs do not appear in the corresponding links for FAILED or KILLED.

      Clicking on FAILED or KILLED reports "No data available in table", even with
      Failed or Killed jobs showing up in the FINISHED link. This can be very
      confusing to a user checking on their jobs, especially for someone using the
      direct URL links, such as:

      http://<RM_HOST>:8088/cluster/apps/FAILED
      http://<RM_HOST>:8088/cluster/apps/KILLED

      Another potential issue is that the RM and AM each have their own
      interpretation of a job's result, so the State and FinalStatus reported in the
      Cluster Metrics display may not align with the defined states in the Cluster
      pulldown. It would be useful to clearly delineate areas of the GUI wrt the
      component visible states of a user's job.

        Activity

        patrick white created issue -
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Another potential issue is that the RM and AM each have their own interpretation of a job's result, so the State and FinalStatus reported in the Cluster Metrics display may not align with the defined states in the Cluster pulldown.

        This is the main point. Let me clarify:

        • The final state of the application is fundamentally different from the FinalStatus of the application.
        • FinalStatus of an application is a separate API for custom frameworks to expose their final status.
        • On the RM UI, the state always represents the state of the application.
        • In MapReduce case, a Failed/Killed job still corresponds to a FINISHED application.

        Given that, is it that you are expecting Failed/Killed mapreduce jobs to appear under FAILED/KILLED applications list?

        Show
        Vinod Kumar Vavilapalli added a comment - Another potential issue is that the RM and AM each have their own interpretation of a job's result, so the State and FinalStatus reported in the Cluster Metrics display may not align with the defined states in the Cluster pulldown. This is the main point. Let me clarify: The final state of the application is fundamentally different from the FinalStatus of the application. FinalStatus of an application is a separate API for custom frameworks to expose their final status. On the RM UI, the state always represents the state of the application. In MapReduce case, a Failed/Killed job still corresponds to a FINISHED application. Given that, is it that you are expecting Failed/Killed mapreduce jobs to appear under FAILED/KILLED applications list?
        Hide
        Jason Lowe added a comment -

        Yes, I think the confusion here is that a user is primarily interested in whether their job succeeded, failed, was killed, etc. The filter links on the apps page implies one can quickly filter the apps to one of those states, yet apps that have failed or killed often don't show up under the FAILED or KILLED filters. It's not clear on the web page that those only filter State instead of the FinalState, and many users may not understand or appreciate the distinction between State and FinalState (at least for MapReduce apps).

        For example, it can be a bit confusing why a job that's killed via mapred job -kill shows up as KILLED / KILLED, while another job where the AM process is killed directly is listed as FINISHED / KILLED.

        Show
        Jason Lowe added a comment - Yes, I think the confusion here is that a user is primarily interested in whether their job succeeded, failed, was killed, etc. The filter links on the apps page implies one can quickly filter the apps to one of those states, yet apps that have failed or killed often don't show up under the FAILED or KILLED filters. It's not clear on the web page that those only filter State instead of the FinalState, and many users may not understand or appreciate the distinction between State and FinalState (at least for MapReduce apps). For example, it can be a bit confusing why a job that's killed via mapred job -kill shows up as KILLED / KILLED, while another job where the AM process is killed directly is listed as FINISHED / KILLED.
        Anupam Seth made changes -
        Field Original Value New Value
        Assignee Anupam Seth [ anupamseth ]
        Hide
        Vinod Kumar Vavilapalli added a comment -

        yet apps that have failed or killed often don't show up under the FAILED or KILLED filters

        I think you mean "jobs that have failed or killed"..

        it can be a bit confusing why a job that's killed via mapred job -kill shows up as KILLED / KILLED

        I am planning to change this. "mapred job -kill" should only affect job-state. We need to have a "app -kill" which works at an application level. I already have a WIP patch from some time back which we should get in.

        many users may not understand or appreciate the distinction between State and FinalState (at least for MapReduce apps).

        This is the fundamental problem we should address. It is happening because the concept of an application is new for mapreduce users.

        IMO, we need to educate the users the distinction between applications vs jobs and appliation-state vs job-state.

        If we agree to that, we can close this down as invalid and work on improving documentation to make the distinction between application and job very explicit.

        Show
        Vinod Kumar Vavilapalli added a comment - yet apps that have failed or killed often don't show up under the FAILED or KILLED filters I think you mean "jobs that have failed or killed".. it can be a bit confusing why a job that's killed via mapred job -kill shows up as KILLED / KILLED I am planning to change this. "mapred job -kill" should only affect job-state. We need to have a "app -kill" which works at an application level. I already have a WIP patch from some time back which we should get in. many users may not understand or appreciate the distinction between State and FinalState (at least for MapReduce apps). This is the fundamental problem we should address. It is happening because the concept of an application is new for mapreduce users. IMO, we need to educate the users the distinction between applications vs jobs and appliation-state vs job-state. If we agree to that, we can close this down as invalid and work on improving documentation to make the distinction between application and job very explicit.
        Hide
        Anupam Seth added a comment -

        Agree that it is an issue of user education for now and hence closing the JIRA.

        In the future, though, we could potentially look at some interesting possibilities to reduce the learning curve for users transitioning from the 1.0 / 0.20 world such as having an MR specific GUI or maybe easier fixes such as column based filtering on the RM, but that is lower priority and can be dealt with as the community deems the best approach with an enhancement request at that time.

        Show
        Anupam Seth added a comment - Agree that it is an issue of user education for now and hence closing the JIRA. In the future, though, we could potentially look at some interesting possibilities to reduce the learning curve for users transitioning from the 1.0 / 0.20 world such as having an MR specific GUI or maybe easier fixes such as column based filtering on the RM, but that is lower priority and can be dealt with as the community deems the best approach with an enhancement request at that time.
        Anupam Seth made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Won't Fix [ 2 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        44d 5h 44m 1 Anupam Seth 11/Apr/12 23:48

          People

          • Assignee:
            Anupam Seth
            Reporter:
            patrick white
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development