Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-291

Optionally a separate daemon should serve JobHistory

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: jobtracker
    • Labels:
      None

      Description

      Currently the JobTracker serves the JobHistory to end-users off files local-disk/hdfs. While running very large clusters with a large user-base might result in lots of traffic for job-history which needlessly taxes the JobTracker. The proposal is to have an optional daemon which handles serving of job-history requests.

      1. HADOOP-5083-v1.11.11.patch
        153 kB
        Amar Kamat
      2. HADOOP-5083-v1.11.4.patch
        148 kB
        Amar Kamat
      3. HADOOP-5083-v1.11.5.patch
        148 kB
        Amar Kamat
      4. HADOOP-5083-v1.11.9.patch
        155 kB
        Amar Kamat
      5. HADOOP-5083-v1.2.patch
        114 kB
        Amar Kamat
      6. HADOOP-5083-v1.9.4.patch
        145 kB
        Amar Kamat
      7. HADOOP-5083-v1.9.patch
        142 kB
        Amar Kamat

        Issue Links

          Activity

          Hide
          Amar Kamat added a comment -

          Result of test-patch :

          [exec] -1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     +1 tests included.  The patch appears to include 14 new or modified tests.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     -1 findbugs.  The patch appears to introduce 2 new Findbugs warnings.
               [exec] 
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
               [exec] 
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
          
          Show
          Amar Kamat added a comment - Result of test-patch : [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 14 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 2 new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          Hide
          Amar Kamat added a comment -

          Attaching a patch that incorporates Sharad's comments. Some of the changes are similar to HADOOP-5247 but is required for proper functioning of this patch.

          Show
          Amar Kamat added a comment - Attaching a patch that incorporates Sharad's comments. Some of the changes are similar to HADOOP-5247 but is required for proper functioning of this patch.
          Hide
          Sharad Agarwal added a comment -

          I think we should keep the default value of "mapred.jobtracker.retirejob.interval" as the old value that is 24 hrs. There are fewer cases which need lower value, those can set it lower. By default people would expect that they can see jobs in the completed list for some time after the job is finished. Also task reports won't be accessible in jobclient for retired jobs.
          minor nit:
          should we create new UtilsforTests in mapreduce package and write the utilities related to new api there?

          Show
          Sharad Agarwal added a comment - I think we should keep the default value of "mapred.jobtracker.retirejob.interval" as the old value that is 24 hrs. There are fewer cases which need lower value, those can set it lower. By default people would expect that they can see jobs in the completed list for some time after the job is finished. Also task reports won't be accessible in jobclient for retired jobs. minor nit: should we create new UtilsforTests in mapreduce package and write the utilities related to new api there?
          Hide
          Sharad Agarwal added a comment -

          some comments:
          JobHistoryServer: do we need Active/Passive state? I think embedded boolean should be good enough as there are only 2 cases embedded/standalone.
          HttpServer: I think adding otherWebAppContext may be avoided. we can use already available method addContext(Context ctxt, boolean isFiltered). Or perhaps we just filter out the jobhistory url in case of standalone.

          Show
          Sharad Agarwal added a comment - some comments: JobHistoryServer: do we need Active/Passive state? I think embedded boolean should be good enough as there are only 2 cases embedded/standalone. HttpServer: I think adding otherWebAppContext may be avoided. we can use already available method addContext(Context ctxt, boolean isFiltered). Or perhaps we just filter out the jobhistory url in case of standalone.
          Hide
          Amar Kamat added a comment -

          Attaching an updated patch.

          Show
          Amar Kamat added a comment - Attaching an updated patch.
          Hide
          Amar Kamat added a comment -

          Attaching a patch that address Sharad's comments. Moved all the JobHistory related code to JobHistroyServer. JobHistoryServer now also acts as JobHistory info manager. Result of test-patch is as follows

          -1 overall.
               [exec]
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec]
               [exec]     +1 tests included.  The patch appears to include 17 new or modified tests.
               [exec]
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec]
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec]
               [exec]     -1 findbugs.  The patch appears to introduce 3 new Findbugs warnings.
               [exec]
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
               [exec]
               [exec]     -1 release audit.  The applied patch generated 835 release audit warnings (more than the trunk's current 832 warnings).
          

          The findbugs warning is because of

          • System.exit()
          • Thread.sleep() with lock : This is also required as even while sleeping the lock cant be released. The locks is taken while initializing and the lock ideally should not be released until the init completes.

          The release aduit warning is for the jsp files. They were simply moved from one folder to another. Also previously these files were not having any headers.

          Show
          Amar Kamat added a comment - Attaching a patch that address Sharad's comments. Moved all the JobHistory related code to JobHistroyServer. JobHistoryServer now also acts as JobHistory info manager. Result of test-patch is as follows -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 17 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 3 new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] -1 release audit. The applied patch generated 835 release audit warnings (more than the trunk's current 832 warnings). The findbugs warning is because of System.exit() Thread.sleep() with lock : This is also required as even while sleeping the lock cant be released. The locks is taken while initializing and the lock ideally should not be released until the init completes. The release aduit warning is for the jsp files. They were simply moved from one folder to another. Also previously these files were not having any headers.
          Hide
          Sharad Agarwal added a comment -

          Some comments on the patch:
          I think we should move all history related code from Jobtracker to another common class which get used from both in process or standalone mode. This class encapsulates all history related code like whether it is running in local mode or standalone mode; and let jobtracker just instantiate it.
          Also, jsps could be made agnostic that those are running in jobtracker's jvm or standalone. They just invoke methods on the instance of this common class.

          Show
          Sharad Agarwal added a comment - Some comments on the patch: I think we should move all history related code from Jobtracker to another common class which get used from both in process or standalone mode. This class encapsulates all history related code like whether it is running in local mode or standalone mode; and let jobtracker just instantiate it. Also, jsps could be made agnostic that those are running in jobtracker's jvm or standalone. They just invoke methods on the instance of this common class.
          Hide
          Amar Kamat added a comment -

          Attaching a patch incorporating Devaraj's comments.

          1. The mapping from tracker-name to job-ids is removed upon a lost -tt
          2. dynamic resolution of the JobHistoryServer is written back to the conf.
          3. Moved test code related to JobHistoryServer into a new test file.

          Testing in progress.

          Show
          Amar Kamat added a comment - Attaching a patch incorporating Devaraj's comments. The mapping from tracker-name to job-ids is removed upon a lost -tt dynamic resolution of the JobHistoryServer is written back to the conf. Moved test code related to JobHistoryServer into a new test file. Testing in progress.
          Hide
          Amar Kamat added a comment -

          Dhruba, HADOOP-4934 is opened to address that. We can discuss this out there. My original intention was to provide a tabbed view(running/succeeded/failed/killed) instead of a long scrollable view and to provide various viewing modes like sort/group based on month, week, day, time (hourly?), user, jobname(lexicographic?) etc. Dont know how much of it is doable and required.

          Show
          Amar Kamat added a comment - Dhruba, HADOOP-4934 is opened to address that. We can discuss this out there. My original intention was to provide a tabbed view(running/succeeded/failed/killed) instead of a long scrollable view and to provide various viewing modes like sort/group based on month, week, day, time (hourly?), user, jobname(lexicographic?) etc. Dont know how much of it is doable and required.
          Hide
          dhruba borthakur added a comment -

          Amar, thanks for the explanation.

          I am assuming that the history folder will have lots and lots of jobs over time (possibly in the tens of thousands). In that case, when a user submits a job in the morning and then comes back the following day to look at the status of his job. he/she finds that the job is retired. he/she then goes to the history server. Now, he/she has to locate his job among the tens of thousands of completed jobs that are in the history folder. The user, typically, does not remember his job id. What does the user do now? Does the history server allow the user to retrieve a list of jobs that he/she submitted? Does it allow a user to list jobs based on reverse chronological order? Just asking

          Show
          dhruba borthakur added a comment - Amar, thanks for the explanation. I am assuming that the history folder will have lots and lots of jobs over time (possibly in the tens of thousands). In that case, when a user submits a job in the morning and then comes back the following day to look at the status of his job. he/she finds that the job is retired. he/she then goes to the history server. Now, he/she has to locate his job among the tens of thousands of completed jobs that are in the history folder. The user, typically, does not remember his job id. What does the user do now? Does the history server allow the user to retrieve a list of jobs that he/she submitted? Does it allow a user to list jobs based on reverse chronological order? Just asking
          Hide
          Amar Kamat added a comment -

          1. If a job is completed and retired, and then the JT as well as the History Server restarts. Can a user get to the logs of a job that was completed earlier?

          As of now the History server simply provides a web interface for the job history files on the history-fs. It simply reads the history file, parses it and allows users to analyze it. JobTracker restart will make sure that

          • the jobs that were marked completed will remain untouched
          • the jobs that were running/pending will be completed. This also includes maintaining the history files and making sure that in the end there is only one history files for a completed job

          Does the History Server keep some sort of an persistent index into the completed/failed jobs?

          Nope. It doesnt require to keep any. All the files are maintained in a job-history folder.

          Show
          Amar Kamat added a comment - 1. If a job is completed and retired, and then the JT as well as the History Server restarts. Can a user get to the logs of a job that was completed earlier? As of now the History server simply provides a web interface for the job history files on the history-fs. It simply reads the history file, parses it and allows users to analyze it. JobTracker restart will make sure that the jobs that were marked completed will remain untouched the jobs that were running/pending will be completed. This also includes maintaining the history files and making sure that in the end there is only one history files for a completed job Does the History Server keep some sort of an persistent index into the completed/failed jobs? Nope. It doesnt require to keep any. All the files are maintained in a job-history folder.
          Hide
          dhruba borthakur added a comment -

          Hi Amar,

          I have a couple of questions about the history server:

          1. If a job is completed and retired, and then the JT as well as the History Server restarts. Can a user get to the logs of a job that was completed earlier?

          2. Does the History Server keep some sort of an persistent index into the completed/failed jobs?

          Show
          dhruba borthakur added a comment - Hi Amar, I have a couple of questions about the history server: 1. If a job is completed and retired, and then the JT as well as the History Server restarts. Can a user get to the logs of a job that was completed earlier? 2. Does the History Server keep some sort of an persistent index into the completed/failed jobs?
          Hide
          Amar Kamat added a comment -

          currently have deployed a fix for our cluster (pre 0.19) that retires most completed jobs when the memory usage on the JT reached 90% of the max heap size. This could be of great help under times of memory pressure on the JT.
          This solution is possibly better than setting number of completed jobs in memory = 0, because doing this will always retire jobs immediately. I would rather retire jobs during times of high memory pressure on the JT.

          HADOOP-4766 tried to do the same thing. Plz look at the discussion as to why jobs should not be retired based on memory usage. HADOOP-4766 can still be used to address memory issues in 0.19. Thoughts?

          Show
          Amar Kamat added a comment - currently have deployed a fix for our cluster (pre 0.19) that retires most completed jobs when the memory usage on the JT reached 90% of the max heap size. This could be of great help under times of memory pressure on the JT. This solution is possibly better than setting number of completed jobs in memory = 0, because doing this will always retire jobs immediately. I would rather retire jobs during times of high memory pressure on the JT. HADOOP-4766 tried to do the same thing. Plz look at the discussion as to why jobs should not be retired based on memory usage. HADOOP-4766 can still be used to address memory issues in 0.19. Thoughts?
          Hide
          dhruba borthakur added a comment -

          I currently have deployed a fix for our cluster (pre 0.19) that retires most completed jobs when the memory usage on the JT reached 90% of the max heap size. This could be of great help under times of memory pressure on the JT.

          This solution is possibly better than setting number of completed jobs in memory = 0, because doing this will always retire jobs immediately. I would rather retire jobs during times of high memory pressure on the JT.

          Show
          dhruba borthakur added a comment - I currently have deployed a fix for our cluster (pre 0.19) that retires most completed jobs when the memory usage on the JT reached 90% of the max heap size. This could be of great help under times of memory pressure on the JT. This solution is possibly better than setting number of completed jobs in memory = 0, because doing this will always retire jobs immediately. I would rather retire jobs during times of high memory pressure on the JT.
          Hide
          Amar Kamat added a comment -

          We are 0.17 (and 0.19) with long-running JTs and one problem we are seeing is that the JT sometimes runs out of its 3GB heap space. The system is used by around 50-80 users. The max jobs per user before retirement is 5. But this still means that the JT keeps info about 80*5=400 completed jobs in memory. Sometimes these jobs have huge number of tasks. This eats up most of the memory in the JT.

          Wouldnt making the number of completed jobs in memory = 0 help? But the problem is that the completed jobs will be available only via history and that will risk the jobtracker.

          I was wondering if ay type of fix for this problem is going to be checked into the 019 branch?

          I think it will be big change to go in 0.19, no? Comments?

          Show
          Amar Kamat added a comment - We are 0.17 (and 0.19) with long-running JTs and one problem we are seeing is that the JT sometimes runs out of its 3GB heap space. The system is used by around 50-80 users. The max jobs per user before retirement is 5. But this still means that the JT keeps info about 80*5=400 completed jobs in memory. Sometimes these jobs have huge number of tasks. This eats up most of the memory in the JT. Wouldnt making the number of completed jobs in memory = 0 help? But the problem is that the completed jobs will be available only via history and that will risk the jobtracker. I was wondering if ay type of fix for this problem is going to be checked into the 019 branch? I think it will be big change to go in 0.19, no? Comments?
          Hide
          dhruba borthakur added a comment -

          @Amar: I was wondering if ay type of fix for this problem is going to be checked into the 019 branch?

          Show
          dhruba borthakur added a comment - @Amar: I was wondering if ay type of fix for this problem is going to be checked into the 019 branch?
          Hide
          Amar Kamat added a comment -

          Dhruba, in this issue we are not purging jobs based on the memory usage. Here the jobs are purged as soon as they exceed some amount of time in the jobtracker which by default is 1 min. There is another process that takes care of serving/displaying completed job info. HADOOP-4766 was opened to address what you want. Since completed jobs will no longer remain in the memory, there is no memory management required.

          Show
          Amar Kamat added a comment - Dhruba, in this issue we are not purging jobs based on the memory usage. Here the jobs are purged as soon as they exceed some amount of time in the jobtracker which by default is 1 min. There is another process that takes care of serving/displaying completed job info. HADOOP-4766 was opened to address what you want. Since completed jobs will no longer remain in the memory, there is no memory management required.
          Hide
          dhruba borthakur added a comment -

          We are 0.17 (and 0.19) with long-running JTs and one problem we are seeing is that the JT sometimes runs out of its 3GB heap space. The system is used by around 50-80 users. The max jobs per user before retirement is 5. But this still means that the JT keeps info about 80*5=400 completed jobs in memory. Sometimes these jobs have huge number of tasks. This eats up most of the memory in the JT.

          Can part of this fix (i.e. purge all jobs from completed queue when there is memory usage on the JT exceeds a configured threshold) be ported to 0.19. It is very useful to have it in 0.19, otherwise the JT just hangs up and the cluster has to be restarted.

          Show
          dhruba borthakur added a comment - We are 0.17 (and 0.19) with long-running JTs and one problem we are seeing is that the JT sometimes runs out of its 3GB heap space. The system is used by around 50-80 users. The max jobs per user before retirement is 5. But this still means that the JT keeps info about 80*5=400 completed jobs in memory. Sometimes these jobs have huge number of tasks. This eats up most of the memory in the JT. Can part of this fix (i.e. purge all jobs from completed queue when there is memory usage on the JT exceeds a configured threshold) be ported to 0.19. It is very useful to have it in 0.19, otherwise the JT just hangs up and the cluster has to be restarted.
          Hide
          Amar Kamat added a comment -

          Attaching a new patch with the following changes :

          1. JobHistoryServerProfile : The profile for the server. It contains server info that remains unchanged for the lifetime of the server.
          2. JobHistoryServerStatus : The status for the server. It contains server info that frequently changes.
          3. Min-job-retire-interval : Jobs are kept in the JobTracker's memory for some time before retiring. The parameter that controls that is mapred.jobtracker.retirejob.interval. After this interval of time the job is purged. There are several reasons for keeping/doing this :
            1. The JobClient periodically polls for job status and immediately removing the job might result into exceptions.
            2. Test cases are based on the fact that the job status and reports are available after the job is complete.
          4. Testcase configuration now supports jobs for 24 hrs just to make sure that the job is available after the jobs finish.
          5. Services in JobHistoryServer.java now follow the naming conventions suggested by Steve.
          6. Added a testcase to test if the jobs are removed from the memory.
          7. Added a testcase to test if jobhistory if properly served.

          Todo :

          1. Check if the jsp files in the webapps/job folder can be retained and provide access to each set (job and jobhistory set) based on the type of server accessing it?
            1. what about index.html for jobhistory? As of now there is a separate index file for the JobHistoryServer.
            2. what about security and access to the jsp files? With this patch there is no way to accidentally allow access to the jobhistory files.
          2. Make JobHistoryServer resilient to memory issues. As of today loadhistory.jsp loads a job's history from the filesystem and caches the result in memory for subsequent access to the same job. One important question to ask is what if multiple users access the JobHistoryServer simultaneously?
          Show
          Amar Kamat added a comment - Attaching a new patch with the following changes : JobHistoryServerProfile : The profile for the server. It contains server info that remains unchanged for the lifetime of the server. JobHistoryServerStatus : The status for the server. It contains server info that frequently changes. Min-job-retire-interval : Jobs are kept in the JobTracker's memory for some time before retiring. The parameter that controls that is mapred.jobtracker.retirejob.interval . After this interval of time the job is purged. There are several reasons for keeping/doing this : The JobClient periodically polls for job status and immediately removing the job might result into exceptions. Test cases are based on the fact that the job status and reports are available after the job is complete. Testcase configuration now supports jobs for 24 hrs just to make sure that the job is available after the jobs finish. Services in JobHistoryServer.java now follow the naming conventions suggested by Steve. Added a testcase to test if the jobs are removed from the memory. Added a testcase to test if jobhistory if properly served. Todo : Check if the jsp files in the webapps/job folder can be retained and provide access to each set ( job and jobhistory set) based on the type of server accessing it? what about index.html for jobhistory? As of now there is a separate index file for the JobHistoryServer . what about security and access to the jsp files? With this patch there is no way to accidentally allow access to the jobhistory files. Make JobHistoryServer resilient to memory issues. As of today loadhistory.jsp loads a job's history from the filesystem and caches the result in memory for subsequent access to the same job. One important question to ask is what if multiple users access the JobHistoryServer simultaneously?
          Hide
          Arun C Murthy added a comment -

          When it's standalone, then it should certainly be started separately. But I think, for simplicity and back-compatibility, the default behavior might be for it to continue to run within the JobTracker JVM.

          +1, we both concur.

          Show
          Arun C Murthy added a comment - When it's standalone, then it should certainly be started separately. But I think, for simplicity and back-compatibility, the default behavior might be for it to continue to run within the JobTracker JVM. +1, we both concur.
          Hide
          Doug Cutting added a comment -

          > This is what the attached patch does. Review?

          The patch looks generally good to me. The change to conf/hadoop-site.xml.template seems accidental. I think we should not default mapred.job.history.http.address to 0.0.0.0:50040 (as in the patch) but rather leave it empty, so that job history continues to run in the JobTracker by default. We should also document that this is what's done if it is empty in hadoop-default.xml, and change the implementation so that an empty string here is interpreted as null.

          Show
          Doug Cutting added a comment - > This is what the attached patch does. Review? The patch looks generally good to me. The change to conf/hadoop-site.xml.template seems accidental. I think we should not default mapred.job.history.http.address to 0.0.0.0:50040 (as in the patch) but rather leave it empty, so that job history continues to run in the JobTracker by default. We should also document that this is what's done if it is empty in hadoop-default.xml, and change the implementation so that an empty string here is interpreted as null.
          Hide
          Doug Cutting added a comment -

          > you do not want the JobTracker starting the stand-alone JobHistoryServer

          When it's standalone, then it should certainly be started separately. But I think, for simplicity and back-compatibility, the default behavior might be for it to continue to run within the JobTracker JVM. We can add a main() routine, command line commands, etc. to run it separately and a config option to stop it from being run within the JobTracker JVM.

          Show
          Doug Cutting added a comment - > you do not want the JobTracker starting the stand-alone JobHistoryServer When it's standalone, then it should certainly be started separately. But I think, for simplicity and back-compatibility, the default behavior might be for it to continue to run within the JobTracker JVM. We can add a main() routine, command line commands, etc. to run it separately and a config option to stop it from being run within the JobTracker JVM.
          Hide
          Arun C Murthy added a comment -

          So JobTracker.main() can check the configuration to see if it should, in addition to starting the jobtracker, start the job history service.

          Doug, I agree with the overall direction of your proposal.

          However, it seems to me that you do not want the JobTracker starting the stand-alone JobHistoryServer, rather we should get the
          $ bin/hadoop jobtracker
          command to start the job-history server which can then be managed (e.g. restart on failure) after by daemontools (http://cr.yp.to/daemontools.html) or init.d or some such. It's reasonably safe to restart the job-history server since it's stateless and read-only. Thoughts?

          Show
          Arun C Murthy added a comment - So JobTracker.main() can check the configuration to see if it should, in addition to starting the jobtracker, start the job history service. Doug, I agree with the overall direction of your proposal. However, it seems to me that you do not want the JobTracker starting the stand-alone JobHistoryServer, rather we should get the $ bin/hadoop jobtracker command to start the job-history server which can then be managed (e.g. restart on failure) after by daemontools ( http://cr.yp.to/daemontools.html ) or init.d or some such. It's reasonably safe to restart the job-history server since it's stateless and read-only. Thoughts?
          Hide
          Amar Kamat added a comment -

          Obviously, I'd like to see the new code integrated with HADOOP-3628. Could you use method names and a stub lifecycle that is compatible?

          +1.

          the JobTracker should not be the place that starts an external JVM, as there are JVM options, console logs and other lifecycle issues to consider -such as keeping the JobHistory daemon live while the JobTracker restarts, yet still being able to log and terminate the JobHistory daemon when required.

          I have implemented #4 from the comment here here. It simply adds the jsps to the current webserver of the jobtracker. No external JVMs are spawned from the jobtracker. Probably what Doug meant was to start another webserver in the same JVM that listens on another port. For now I felt its easy to simply add the jsps to the current webserver.

          This is a really good opportunity to start HtmlUnit tests for the JSP pages, especially considering the amount of Java code embedded into them.

          Currently I am fixing the ant tests as jobs will be removed from the jobtracker as soon as they are done. I will start working on the test case for html/jsp as soon as this is done. Should the test case be addressed in a separate issue?

          Are there any security concerns in the proposed method or its implementation? I am looking into it too.

          Show
          Amar Kamat added a comment - Obviously, I'd like to see the new code integrated with HADOOP-3628 . Could you use method names and a stub lifecycle that is compatible? +1. the JobTracker should not be the place that starts an external JVM, as there are JVM options, console logs and other lifecycle issues to consider -such as keeping the JobHistory daemon live while the JobTracker restarts, yet still being able to log and terminate the JobHistory daemon when required. I have implemented #4 from the comment here here . It simply adds the jsps to the current webserver of the jobtracker. No external JVMs are spawned from the jobtracker. Probably what Doug meant was to start another webserver in the same JVM that listens on another port. For now I felt its easy to simply add the jsps to the current webserver. This is a really good opportunity to start HtmlUnit tests for the JSP pages, especially considering the amount of Java code embedded into them. Currently I am fixing the ant tests as jobs will be removed from the jobtracker as soon as they are done. I will start working on the test case for html/jsp as soon as this is done. Should the test case be addressed in a separate issue? Are there any security concerns in the proposed method or its implementation? I am looking into it too.
          Hide
          steve_l added a comment -
          1. Obviously, I'd like to see the new code integrated with HADOOP-3628. Could you use method names and a stub lifecycle that is compatible?
          2. the JobTracker should not be the place that starts an external JVM, as there are JVM options, console logs and other lifecycle issues to consider -such as keeping the JobHistory daemon live while the JobTracker restarts, yet still being able to log and terminate the JobHistory daemon when required.
          3. This is a really good opportunity to start HtmlUnit tests for the JSP pages, especially considering the amount of Java code embedded into them.
          Show
          steve_l added a comment - Obviously, I'd like to see the new code integrated with HADOOP-3628 . Could you use method names and a stub lifecycle that is compatible? the JobTracker should not be the place that starts an external JVM, as there are JVM options, console logs and other lifecycle issues to consider -such as keeping the JobHistory daemon live while the JobTracker restarts, yet still being able to log and terminate the JobHistory daemon when required. This is a really good opportunity to start HtmlUnit tests for the JSP pages, especially considering the amount of Java code embedded into them.
          Hide
          Amar Kamat added a comment -

          Attaching a patch the implements what I have mentioned here. Also by default if the job-history-server configurations is not passed then jobhistory is served via the jobtracker webui.

          By default we might continue to run this service in the same JVM as the jobtracker, so we don't force the maintenance of another daemon on every installation. Only folks who have huge clusters need configure things so that this is run as a separate process, potentially on a separate host.

          This is what the attached patch does. Review?

          So JobTracker.main() can check the configuration to see if it should, in addition to starting the jobtracker, start the job history service. In both cases, it should use independent ports from the jobtracker.

          This is easy to do with the current patch. JobHistoryServer.java starts a webserver that serves the jobhistory jsps (i.e make jobhistory as the main context of the webserver) and can be invoked within the same jvm. But its simpler to add the jsp context to the current webserver which is done in this patch. Comments?

          In general, it should be simple to run all of our daemons in a single JVM, or to mix-and-match them. This should require at most a custom main() routine per JVM. We use this kind of configuration for unit testing already.

          This will require more refactoring and I feel should be done in a separate (unification) jira, thoughts?

          Show
          Amar Kamat added a comment - Attaching a patch the implements what I have mentioned here . Also by default if the job-history-server configurations is not passed then jobhistory is served via the jobtracker webui. By default we might continue to run this service in the same JVM as the jobtracker, so we don't force the maintenance of another daemon on every installation. Only folks who have huge clusters need configure things so that this is run as a separate process, potentially on a separate host. This is what the attached patch does. Review? So JobTracker.main() can check the configuration to see if it should, in addition to starting the jobtracker, start the job history service. In both cases, it should use independent ports from the jobtracker. This is easy to do with the current patch. JobHistoryServer.java starts a webserver that serves the jobhistory jsps (i.e make jobhistory as the main context of the webserver) and can be invoked within the same jvm. But its simpler to add the jsp context to the current webserver which is done in this patch. Comments? In general, it should be simple to run all of our daemons in a single JVM, or to mix-and-match them. This should require at most a custom main() routine per JVM. We use this kind of configuration for unit testing already. This will require more refactoring and I feel should be done in a separate (unification) jira, thoughts?
          Hide
          Doug Cutting added a comment -

          By default we might continue to run this service in the same JVM as the jobtracker, so we don't force the maintenance of another daemon on every installation. Only folks who have huge clusters need configure things so that this is run as a separate process, potentially on a separate host. So JobTracker.main() can check the configuration to see if it should, in addition to starting the jobtracker, start the job history service. In both cases, it should use independent ports from the jobtracker.

          In general, it should be simple to run all of our daemons in a single JVM, or to mix-and-match them. This should require at most a custom main() routine per JVM. We use this kind of configuration for unit testing already.

          Show
          Doug Cutting added a comment - By default we might continue to run this service in the same JVM as the jobtracker, so we don't force the maintenance of another daemon on every installation. Only folks who have huge clusters need configure things so that this is run as a separate process, potentially on a separate host. So JobTracker.main() can check the configuration to see if it should, in addition to starting the jobtracker, start the job history service. In both cases, it should use independent ports from the jobtracker. In general, it should be simple to run all of our daemons in a single JVM, or to mix-and-match them. This should require at most a custom main() routine per JVM. We use this kind of configuration for unit testing already.
          Hide
          Amar Kamat added a comment -

          Things to ponder :

          • What if no job-history-server information is not passed to the jobtracker i.e is missing?
            1. bail out
            2. start it from the jobtracker in a separate jvm
            3. ignore and continue
            4. support it in the jobtracker's webserver as done today

          I think (1) will be the simpler but too restrictive, (2) should be fine but complicated (overhead etc), (3) will be simpler but not user friendly and (4) should be fine but will be risky

          Show
          Amar Kamat added a comment - Things to ponder : What if no job-history-server information is not passed to the jobtracker i.e is missing? bail out start it from the jobtracker in a separate jvm ignore and continue support it in the jobtracker's webserver as done today I think (1) will be the simpler but too restrictive, (2) should be fine but complicated (overhead etc), (3) will be simpler but not user friendly and (4) should be fine but will be risky
          Hide
          Amar Kamat added a comment -

          Here is one proposal :

          • Run job-history-server as a separate mapred daemon, similar to namenode and jobtracker. Start it after the namenode and before the jobtracker.
          • The jobtracker should be passed the server info via the conf, similar to how the namenode info is passed to it. Say mapred.job.history.server.address which should be hostname:port.
          • The jobtracker passes this info to its web server and history link on the jobtracker web-ui points to this server
          • All the jsp code to do with the job history (analysis, loading etc) gets moved to a separate webapp folder say history. Make sure that the history can no longer accessed by the jobtracker web-ui
          • Retire jobs as soon as they finish
          • Make job-history-server a standalone entity which can be used without the jobtracker something like _http://hostname:50040_, redirecting to say jobhistory.jsp. Note that this will help in offline browsing of history even if the jobtracker is down or under maintenance. Also in case of some server issue, this server can be restarted as like any other daemon
          • Keep the heap of this server less as compared to other daemons (configurable)
          • by default run this server on the jobtracker machine (i.e via hod etc). With lesser heap size and a separate jvm, the jobtracker (process + host) will be safe from memory issues

          I have purposefully skipped minute details as they can be worked out later once we agree on the direction.

          Things to ponder :

          • Running job-history-server on a separate machine would work if job-history is on hdfs, what if the job-history is on the local-fs. As of now we can leave it to the cluster admin to make sure that the job-history-server should run with the right parameters i.e right machine and also with the same parameters that will be passed to the jobtracker.
          • What if the job-history-server goes down after the jobtracker goes down?. Either get the server up on the same address (host:port) or start it someplace else and restart the jobtracker with the new address. Is there some way to reload the jobtracker with new parameter values without restarting.

          Thoughts?

          Show
          Amar Kamat added a comment - Here is one proposal : Run job-history-server as a separate mapred daemon, similar to namenode and jobtracker. Start it after the namenode and before the jobtracker. The jobtracker should be passed the server info via the conf, similar to how the namenode info is passed to it. Say mapred.job.history.server.address which should be hostname:port . The jobtracker passes this info to its web server and history link on the jobtracker web-ui points to this server All the jsp code to do with the job history (analysis, loading etc) gets moved to a separate webapp folder say history . Make sure that the history can no longer accessed by the jobtracker web-ui Retire jobs as soon as they finish Make job-history-server a standalone entity which can be used without the jobtracker something like _ http://hostname:50040_ , redirecting to say jobhistory.jsp . Note that this will help in offline browsing of history even if the jobtracker is down or under maintenance. Also in case of some server issue, this server can be restarted as like any other daemon Keep the heap of this server less as compared to other daemons (configurable) by default run this server on the jobtracker machine (i.e via hod etc). With lesser heap size and a separate jvm, the jobtracker (process + host) will be safe from memory issues I have purposefully skipped minute details as they can be worked out later once we agree on the direction. Things to ponder : Running job-history-server on a separate machine would work if job-history is on hdfs, what if the job-history is on the local-fs. As of now we can leave it to the cluster admin to make sure that the job-history-server should run with the right parameters i.e right machine and also with the same parameters that will be passed to the jobtracker. What if the job-history-server goes down after the jobtracker goes down?. Either get the server up on the same address (host:port) or start it someplace else and restart the jobtracker with the new address. Is there some way to reload the jobtracker with new parameter values without restarting. Thoughts?

            People

            • Assignee:
              Amar Kamat
              Reporter:
              Arun C Murthy
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:

                Development