Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1720

'Killed' jobs and 'Failed' jobs should be displayed seperately in JobTracker UI

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: 0.20.1
    • Fix Version/s: None
    • Component/s: jobtracker
    • Labels:
    • Environment:

      all

      Description

      The JobTracker UI shows both Failed/Killed Jobs as Failed. The Killed job status has been separated from Failed as part of HADOOP-3924, so the UI needs to be updated to reflect the same..

        Activity

        Hide
        amareshwari Amareshwari Sriramadasu added a comment -

        Changed fix version to 0.20.3, because 0.20.2 is already released.

        Show
        amareshwari Amareshwari Sriramadasu added a comment - Changed fix version to 0.20.3, because 0.20.2 is already released.
        Hide
        yhemanth Hemanth Yamijala added a comment -

        Maybe we can group all these jobs under a table called 'Unsuccessful Jobs' and have a column to indicate whether they were killed or failed ?

        Show
        yhemanth Hemanth Yamijala added a comment - Maybe we can group all these jobs under a table called 'Unsuccessful Jobs' and have a column to indicate whether they were killed or failed ?
        Hide
        qwertymaniac Harsh J added a comment -

        Attaching a patch that changes the "Failed jobs" in the UI to "Unsucessful jobs" and displays a reason column that clearly indicates whether the failed job failed on its own or got killed.

        Attached PNG image shows a screenshot of the same while executing mapred/TestJobKillAndFail

        [Also cleaned up the JSPUtil.generateJobTable(...) method as I was modifying it.]

        Show
        qwertymaniac Harsh J added a comment - Attaching a patch that changes the "Failed jobs" in the UI to "Unsucessful jobs" and displays a reason column that clearly indicates whether the failed job failed on its own or got killed. Attached PNG image shows a screenshot of the same while executing mapred/TestJobKillAndFail [Also cleaned up the JSPUtil.generateJobTable(...) method as I was modifying it.]
        Hide
        amar_kamat Amar Kamat added a comment -

        Should we name the column named "Reason" as "State" and add a "Reason" column indicating the reason why the job was failed/killed? In the failed section, we can mention the reason for failure (assuming the reason is stored in the job status). As an improvement, we can mention some more details like who [user+group] killed the job etc. Thoughts?

        Show
        amar_kamat Amar Kamat added a comment - Should we name the column named "Reason" as "State" and add a "Reason" column indicating the reason why the job was failed/killed? In the failed section, we can mention the reason for failure (assuming the reason is stored in the job status). As an improvement, we can mention some more details like who [user+group] killed the job etc. Thoughts?
        Hide
        qwertymaniac Harsh J added a comment -

        I think that's a good idea. Jobs can be killed for a reason (say, hadoop job -kill JobID <reason why we're killing it>?), which can be included into the JobStatus data, same with the whois of the killer; but when it comes to failure, how do we deduce the 'reason' of failure – just task numbers as explained in the MAPREDUCE-343 ticket?

        I also think that for Unsuccessful Jobs, displaying map and reduce progress percentages is not a good thing, as it is not very indicative of the actual progress (Always shows 100 or 0). We could remove this and claim some good real estate to display reasons (limited characters of it).

        Show
        qwertymaniac Harsh J added a comment - I think that's a good idea. Jobs can be killed for a reason (say, hadoop job -kill JobID <reason why we're killing it>?), which can be included into the JobStatus data, same with the whois of the killer; but when it comes to failure, how do we deduce the 'reason' of failure – just task numbers as explained in the MAPREDUCE-343 ticket? I also think that for Unsuccessful Jobs, displaying map and reduce progress percentages is not a good thing, as it is not very indicative of the actual progress (Always shows 100 or 0). We could remove this and claim some good real estate to display reasons (limited characters of it).
        Hide
        amar_kamat Amar Kamat added a comment -

        I think the reason for keeping the % information in the failed section is to keep the table consistent with the running jobs table. I would like to know what other think about this. Maybe as an enhancement, we can separate out running/failed/killed jobs into separate tabs (jsp tabs?). Going forward, we might want to display some statistics related to each section which should not clutter the main (navigation?) page. History can also be displayed as one of the tabs.

        Show
        amar_kamat Amar Kamat added a comment - I think the reason for keeping the % information in the failed section is to keep the table consistent with the running jobs table. I would like to know what other think about this. Maybe as an enhancement, we can separate out running/failed/killed jobs into separate tabs (jsp tabs?). Going forward, we might want to display some statistics related to each section which should not clutter the main (navigation?) page. History can also be displayed as one of the tabs.
        Hide
        qwertymaniac Harsh J added a comment -

        I actually like the concept of having all (retained) jobs listed in one page. This way, if you are monitoring one, and it actually fails or is killed, it still remains on the same page; not requiring an inquisitive search for where it really went. With browser search, one can also lookup the ID on a single page, without having to switch any context for the resultant state (be it tab or page). But yes, the representation could use a little work as the pages tend to get longer and longer over load/time.

        About job history/details, since the current job details page gets pretty large thanks to counters and charts, it wouldn't look good even if it were expanded inline, I think. Although we can have a short summary (defn.?) which can be shown inline when clicked/hovered.

        Show
        qwertymaniac Harsh J added a comment - I actually like the concept of having all (retained) jobs listed in one page. This way, if you are monitoring one, and it actually fails or is killed, it still remains on the same page; not requiring an inquisitive search for where it really went. With browser search, one can also lookup the ID on a single page, without having to switch any context for the resultant state (be it tab or page). But yes, the representation could use a little work as the pages tend to get longer and longer over load/time. About job history/details, since the current job details page gets pretty large thanks to counters and charts, it wouldn't look good even if it were expanded inline, I think. Although we can have a short summary (defn.?) which can be shown inline when clicked/hovered.
        Hide
        qwertymaniac Harsh J added a comment -

        Amar, it would seem that what you ask for may be present in NG's MAPREDUCE-2399

        Am not sure how that ticket affects this and all other pending UI issues, but it looks like a welcome change from the usual JSP pages.

        Show
        qwertymaniac Harsh J added a comment - Amar, it would seem that what you ask for may be present in NG's MAPREDUCE-2399 Am not sure how that ticket affects this and all other pending UI issues, but it looks like a welcome change from the usual JSP pages.
        Hide
        qwertymaniac Harsh J added a comment -

        I can provide an updated patch for 0.20-security branch if there's still interest in this.

        Show
        qwertymaniac Harsh J added a comment - I can provide an updated patch for 0.20-security branch if there's still interest in this.
        Hide
        subru Subru Krishnan added a comment -

        @Harsh the patch would be quite useful for us, so it would be great if you can provide it. Thanks.

        Show
        subru Subru Krishnan added a comment - @Harsh the patch would be quite useful for us, so it would be great if you can provide it. Thanks.
        Hide
        qwertymaniac Harsh J added a comment -

        Looks like this may be required/useful for 0.22 sub releases per Arun's label.

        Show
        qwertymaniac Harsh J added a comment - Looks like this may be required/useful for 0.22 sub releases per Arun's label.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12467289/mapred.failed.killed.difference.png
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1475//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12467289/mapred.failed.killed.difference.png against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1475//console This message is automatically generated.
        Hide
        qwertymaniac Harsh J added a comment -

        Given that the UI does indicate in the reason if its FAILED or KILLED, and MR2's UIs (YARN's and JHS' both) does not have this specific issue anymore today, am closing this as Not A Problem. Feel free to reopen and rebase the patch if you feel such a change is very worthy on the sustaining MR1 side.

        Show
        qwertymaniac Harsh J added a comment - Given that the UI does indicate in the reason if its FAILED or KILLED, and MR2's UIs (YARN's and JHS' both) does not have this specific issue anymore today, am closing this as Not A Problem. Feel free to reopen and rebase the patch if you feel such a change is very worthy on the sustaining MR1 side.
        Hide
        bruce.ni nixuchi added a comment -

        good luck!

        Show
        bruce.ni nixuchi added a comment - good luck!

          People

          • Assignee:
            qwertymaniac Harsh J
            Reporter:
            subru Subru Krishnan
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development