Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: client, jobtracker
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Hide
      Provides a way to configure the cache of JobStatus objects for the retired jobs.
      Adds an API in RunningJob to access history file url.
      Adds a LRU based cache for job history files loaded in memory when accessed via JobTracker web UI.
      Adds Retired Jobs table on the Jobtracker UI. The job move from Running to Completed/Failed table. Then job move to Retired table when it is purged from memory. The Retired table shows last 100 retired jobs. The Completed/Failed jobs table are only shown if there are non-zero jobs in the table.
      Show
      Provides a way to configure the cache of JobStatus objects for the retired jobs. Adds an API in RunningJob to access history file url. Adds a LRU based cache for job history files loaded in memory when accessed via JobTracker web UI. Adds Retired Jobs table on the Jobtracker UI. The job move from Running to Completed/Failed table. Then job move to Retired table when it is purged from memory. The Retired table shows last 100 retired jobs. The Completed/Failed jobs table are only shown if there are non-zero jobs in the table.

      Description

      MAPREDUCE-814 will provide a way to keep the job history files in HDFS. There should be a way to get the url for the completed job history fie. The completed jobs can be purged from memory more aggressively from jobtracker since the clients can retrieve the information from history file. Jobtracker can just maintain the very basic info about the completed jobs.

      1. 817_ydist_new_1.patch
        2 kB
        Sharad Agarwal
      2. 817_ydist_new.patch
        38 kB
        Sharad Agarwal
      3. 817_ydist.patch
        49 kB
        Sharad Agarwal
      4. 817_v3.patch
        61 kB
        Sharad Agarwal
      5. 817_v2.patch
        64 kB
        Sharad Agarwal
      6. 817_v1.patch
        60 kB
        Sharad Agarwal

        Issue Links

          Activity

          Hide
          Sharad Agarwal added a comment -

          We can add an API in JobClient say:
          String getCompletedJobHistoryURL(jobId) throws IOException
          In case job is not completed or history file not yet available in HDFS, it will throw an exception with proper message.

          There is a concern that history file name can't be inferred just from the job id. Currently the file name consists of jobid, username, timestamp etc info, which get used by history viewer UI and CLI tool. So for this API, the jobtracker would have to cache the file name for a given jobid.

          The related issue is of job retiring from memory. Currently job get retired based on "mapred.jobtracker.retirejob.interval.min" and "mapred.jobtracker.completeuserjobs.maximum". Since full job datastructures are huge, it can't stay in memory for long. I propose that jobtracker knock out the job from memory as soon as its history file is available in HDFS (MAPREDUCE-814). Jobtracker keeps bare minimum completed job report (status, #failedmaps, #failedreduces,..) in the order of few bytes in memory.
          Assuming 100 bytes are stored for each completed job, 10,000 completed tiny job reports in memory would take 1 MB.

          Show
          Sharad Agarwal added a comment - We can add an API in JobClient say: String getCompletedJobHistoryURL(jobId) throws IOException In case job is not completed or history file not yet available in HDFS, it will throw an exception with proper message. There is a concern that history file name can't be inferred just from the job id. Currently the file name consists of jobid, username, timestamp etc info, which get used by history viewer UI and CLI tool. So for this API, the jobtracker would have to cache the file name for a given jobid. The related issue is of job retiring from memory. Currently job get retired based on "mapred.jobtracker.retirejob.interval.min" and "mapred.jobtracker.completeuserjobs.maximum". Since full job datastructures are huge, it can't stay in memory for long. I propose that jobtracker knock out the job from memory as soon as its history file is available in HDFS ( MAPREDUCE-814 ). Jobtracker keeps bare minimum completed job report (status, #failedmaps, #failedreduces,..) in the order of few bytes in memory. Assuming 100 bytes are stored for each completed job, 10,000 completed tiny job reports in memory would take 1 MB.
          Hide
          Sharad Agarwal added a comment -

          This patch;

          • Adds the getHistoryFile() api to JobStatus
          • As soon as the job history file is moved to HDFS, the entire job datastructures are removed from the memory, barring JobStatus object. JobStatus objects are kept in RetiredJobs cache with default cache size 0f 5000.
          • Retired jobs table is added to the Jobtracker UI, which will show the last 100 jobs in reverse "finish time" order. Perhaps later we can add the pagination to show more jobs. The job id links to the Job history page for that job.
          • On accessing the job history file via UI, the history info is loaded and put in the LRU cache, default size being 5.
          • The "Completed" and "Failed" tables will show up in the UI only if there are jobs which have completed but not yet "Retired".
          Show
          Sharad Agarwal added a comment - This patch; Adds the getHistoryFile() api to JobStatus As soon as the job history file is moved to HDFS, the entire job datastructures are removed from the memory, barring JobStatus object. JobStatus objects are kept in RetiredJobs cache with default cache size 0f 5000. Retired jobs table is added to the Jobtracker UI, which will show the last 100 jobs in reverse "finish time" order. Perhaps later we can add the pagination to show more jobs. The job id links to the Job history page for that job. On accessing the job history file via UI, the history info is loaded and put in the LRU cache, default size being 5. The "Completed" and "Failed" tables will show up in the UI only if there are jobs which have completed but not yet "Retired".
          Hide
          Sharad Agarwal added a comment -

          Fixed bugs and code cleanup.

          Show
          Sharad Agarwal added a comment - Fixed bugs and code cleanup.
          Hide
          Sharad Agarwal added a comment -

          Added more test in TestJobRetire and incorporated Devaraj's off line comments:
          removed unnecessary synchronization in Jobtracker#historyFileCopied
          fixed javadoc in JobStatus

          Show
          Sharad Agarwal added a comment - Added more test in TestJobRetire and incorporated Devaraj's off line comments: removed unnecessary synchronization in Jobtracker#historyFileCopied fixed javadoc in JobStatus
          Hide
          Devaraj Das added a comment -

          +1

          Show
          Devaraj Das added a comment - +1
          Hide
          Sharad Agarwal added a comment -

          test patch passed
          ant test passed except TestCapacityScheduler which has a jira MAPREDUCE-848

          Show
          Sharad Agarwal added a comment - test patch passed ant test passed except TestCapacityScheduler which has a jira MAPREDUCE-848
          Hide
          Sharad Agarwal added a comment -

          I just committed this.

          Show
          Sharad Agarwal added a comment - I just committed this.
          Hide
          Sharad Agarwal added a comment -

          Patch for Yahoo's distribution. This patch depends on patch form MAPREDUCE-814 - https://issues.apache.org/jira/secure/attachment/12416019/814_ydist.patch

          Show
          Sharad Agarwal added a comment - Patch for Yahoo's distribution. This patch depends on patch form MAPREDUCE-814 - https://issues.apache.org/jira/secure/attachment/12416019/814_ydist.patch
          Hide
          Sharad Agarwal added a comment -

          New patch for Yahoo's distribution. It does NOT introduce client side API changes.

          Show
          Sharad Agarwal added a comment - New patch for Yahoo's distribution. It does NOT introduce client side API changes.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #46 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/46/)
          . Add a cache for retired jobs with minimal job info and provide a way to access history file url.

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #46 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/46/ ) . Add a cache for retired jobs with minimal job info and provide a way to access history file url.
          Hide
          Sharad Agarwal added a comment -

          Patch for Yahoo's distribution. It fixes an issue in RetireJob code related to concurrent modification. This patch must be applied on top of earlier patch -> https://issues.apache.org/jira/secure/attachment/12416227/817_ydist.patch

          Show
          Sharad Agarwal added a comment - Patch for Yahoo's distribution. It fixes an issue in RetireJob code related to concurrent modification. This patch must be applied on top of earlier patch -> https://issues.apache.org/jira/secure/attachment/12416227/817_ydist.patch
          Show
          Sharad Agarwal added a comment - Correction in last comment: https://issues.apache.org/jira/secure/attachment/12416445/817_ydist_new_1.patch must be applied on top of previous patch -> https://issues.apache.org/jira/secure/attachment/12416316/817_ydist_new.patch

            People

            • Assignee:
              Sharad Agarwal
              Reporter:
              Sharad Agarwal
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development