[MAPREDUCE-323] Improve the way job history files are managed - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 0.21.0, 0.22.0
Fix Version/s: 0.20.203.0
Component/s: jobtracker
Labels:
None

Release Note:

Hide
This patch does four things:

* it changes the directory structure of the done directory that holds history logs for jobs that are completed,
* it builds toy databases for completed jobs, so we no longer have to scan 2N files on DFS to find out facts about the N jobs that have completed since the job tracker started [which can be hundreds of thousands of files in practical cases],
* it changes the job history browser to display more information and allow more filtering criteria, and
* it creates a new programmatic interface for finding files matching user-chosen criteria. This allows users to no longer be concerned with our methods of storing them, in turn allowing us to change those at will.

The new API described above, which can be used to programmatically obtain history file PATHs given search criteria, is described below:

    package org.apache.hadoop.mapreduce.jobhistory;
    ...

    // this interface is within O.A.H.mapreduce.jobhistory.JobHistory:

    // holds information about one job hostory log in the done
    // job history logs
    public static class JobHistoryJobRecord {
       public Path getPath() { ... }
       public String getJobIDString() { ... }
       public long getSubmitTime() { ... }
       public String getUserName() { ... }
       public String getJobName() { ... }
    }

    public class JobHistoryRecordRetriever implements Iterator<JobHistoryJobRecord> {
       // usual Interface methods -- remove() throws UnsupportedOperationException
       // returns the number of calls to next() that will succeed
       public int numMatches() { ... }
    }

    // returns a JobHistoryRecordRetriever that delivers all Path's of job matching job history files,
    // in no particular order. Any criterion that is null or the empty string does not constrain.
    // All criteria that are specified are applied conjunctively, except that if there's more than
    // one date you retrieve all Path's matching ANY date.
    // soughtUser and soughtJobid must match exactly.
    // soughtJobName can match the entire job name or any substring.
    // dates must be in the format exactly MM/DD/YYYY .
    // Dates' leading digits must be 2's . We're incubating a Y3K problem.
    public JobHistoryRecordRetriever getMatchingJob
        (String soughtUser, String soughtJobName, String[] dateStrings, String soughtJobid)
      throws IOException

Show
This patch does four things: * it changes the directory structure of the done directory that holds history logs for jobs that are completed, * it builds toy databases for completed jobs, so we no longer have to scan 2N files on DFS to find out facts about the N jobs that have completed since the job tracker started [which can be hundreds of thousands of files in practical cases], * it changes the job history browser to display more information and allow more filtering criteria, and * it creates a new programmatic interface for finding files matching user-chosen criteria. This allows users to no longer be concerned with our methods of storing them, in turn allowing us to change those at will. The new API described above, which can be used to programmatically obtain history file PATHs given search criteria, is described below:     package org.apache.hadoop.mapreduce.jobhistory;     ...     // this interface is within O.A.H.mapreduce.jobhistory.JobHistory:     // holds information about one job hostory log in the done     // job history logs     public static class JobHistoryJobRecord {        public Path getPath() { ... }        public String getJobIDString() { ... }        public long getSubmitTime() { ... }        public String getUserName() { ... }        public String getJobName() { ... }     }     public class JobHistoryRecordRetriever implements Iterator<JobHistoryJobRecord> {        // usual Interface methods -- remove() throws UnsupportedOperationException        // returns the number of calls to next() that will succeed        public int numMatches() { ... }     }     // returns a JobHistoryRecordRetriever that delivers all Path's of job matching job history files,     // in no particular order. Any criterion that is null or the empty string does not constrain.     // All criteria that are specified are applied conjunctively, except that if there's more than     // one date you retrieve all Path's matching ANY date.     // soughtUser and soughtJobid must match exactly.     // soughtJobName can match the entire job name or any substring.     // dates must be in the format exactly MM/DD/YYYY .     // Dates' leading digits must be 2's . We're incubating a Y3K problem.     public JobHistoryRecordRetriever getMatchingJob         (String soughtUser, String soughtJobName, String[] dateStrings, String soughtJobid)       throws IOException

Description

Today all the jobhistory files are dumped in one job-history folder. This can cause problems when there is a need to search the history folder (job-recovery etc). It would be nice if we group all the jobs under a user folder. So all the jobs for user amar will go in history-folder/amar/. Jobs can be categorized using various features like jobid, date, jobname etc but using username will make the search much more efficient and also will not result into namespace explosion.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MR323--2010-08-20--1533.patch
20/Aug/10 22:40
75 kB
Dick King
MR323--2010-08-25--1632.patch
26/Aug/10 22:26
75 kB
Dick King
MR323--2010-08-27--1359.patch
27/Aug/10 22:16
79 kB
Dick King
MR323--2010-08-27--1613.patch
27/Aug/10 23:15
81 kB
Dick King
MR323--2010-09-07--1636.patch
07/Sep/10 23:48
79 kB
Dick King

Issue Links

blocks

MAPREDUCE-1988 We would like more information to be available in job history index

Resolved

incorporates

HADOOP-4934 Distinguish running/successful/failed/killed jobs in jobtracker's history

Closed

is blocked by

MAPREDUCE-70 Unify the way job history filename is parsed

Resolved

is depended upon by

MAPREDUCE-1941 Need a servlet in JobTracker to stream contents of the job history file

Resolved

is related to

MAPREDUCE-1978 [Rumen] TraceBuilder should provide recursive input folder scanning

Closed

requires

MAPREDUCE-2032 TestJobOutputCommitter fails in ant test run

Closed

(1 requires)

Improve the way job history files are managed

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates