|
The output data may be deleted anytime when it is no longer needed. The log data may be needed long after the output data is deleted. +1 on Eric's suggestion. The jobhistory viewer (web server with the job history related JSPs) can take the output directory as the input and populate the history datastructures. This can be on a per-user basis for now (e.g., bin/hadoop jobhistoryview -output <dir> .. ), and, in the future, we could make the viewer a centralized web-enabled server that anyone can use. Thoughts?
As Eric suggested, this can be solved by allowing the user to move the files to some persistent location... BTW, Is this issue the same as H-1876 To address both the issues (
The job history viewer(centralized) can display available history files sorted based on the creation time on the dfs and user can select the job he wants to view from there. For 1876, job status can be queried giving the jobid, hostname and username. The hostname can be inferred and the username if not specified will be ignored and the first occurence of the jobid string taken. I don't think it makes sense to address
I think it is better to keep the server logs on the local machine by default and allow a configuration to store them in a well specified place on HDFS. This will keep a HOD user from creating leavings all over the HDFS, but allow for alternate uses like We have 3 use cases we need to think about separately: 1) A cluster with one permanent JT. This should have a single well know place it logs. 2) A HOD JT, this should send logs to a user specified directory 3) Central logging of job summary data that works in either of the above cases. We should create a distinct JIRA to discuss this. Makes sense i think. Keeps things simple..
+1.
This is much clearer. Need coordination in implementing this and To address use cases 1 and 2 suggested by Eric, I propose the following approach.
If the job tracker is static, we will store history logs in a location specified by hadoop.job.history.location, by default it is local file system. We will not have index file any more, because appending becomes an issue in DFS. And we dont need one in case of non-static JT. Even with hod JT, we still need to address case 3. That is, The user vs admin dimension is not really the same as the static vs HOD dimension.
Even in a static JT, the job history is probably usefully part of the output directory (for a user). I think this should just be part of the output API, no matter how the cluster is configured. This will be much easier to document and use. Then we could handle static job trackers as Amareshwari describes. In the case of HOD deployed JTs, I think we can then either set hadoop.job.history.location to NULL or to a HOD specified output directory, probably on HDFS or another shared FS. This would be useful if the user is running a lot jobs through a single HOD instance. NULL is probably a fine default. — I agree with runping that we need to define an API for collecting central stats from HOD deployed JTs. I think a configured URL is ok as an API, but we need to be clear that this output will be for central collection, not user diagnostics and as such the layout should be optimized to simplify that (probably time sorted, not user sorted for example). Ideally this could be a single file per JT instance.
In case of HOD, hadoop.job.history.location default value can be local file system, sothat HOD can collect logs at the shutdown of cluster as is today. This patch logs history in hadoop.job.history.location whose default value is local file system.
In addition to that it logs at hadoop.job.history.user.location whose default value isuser output directory. And now the history jsp files are in seperate webapps i.e src/webapps/history, ratherthan in src/webapps/job . Now we have centralised location of history as "hadoop.job.history.location" . try hudson with the correct patch.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12373372/patch-2178.txt against trunk revision r612995. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1630/testReport/ This message is automatically generated. looking into test failures
Submiting again for ant tests.
Canceling patch for review comments from Devaraj
Submiting patch with comments incorporated and tested.
TestPipes needs a pathFilter
Submiting again with TestPipes fix
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12373528/patch-2178.txt against trunk revision r613115. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1645/testReport/ This message is automatically generated. All tests passed on my machine with this patch
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12373591/patch-2178.txt against trunk revision r613499. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1663/testReport/ This message is automatically generated. Adding documentation
Submiting patch with documentation added to cluster_setup and mapred_tutorial. There is no change in the code.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12373741/patch-2178.txt against trunk revision r614192. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1676/testReport/ This message is automatically generated. org.apache.hadoop.dfs.TestCrcCorruption.testCrcCorruption failed because datanodes are bad.
This is currently being investigated in
Some comments (sorry for being late on this)
1) Some of the configurable items are not listed in hadoop-default.xml. For example - mapred.job.history.http.bindAddress 2) mapred.job.history.viewer seems redundant 3) The documentation should call out those cases, where history files are getting created in the output directory, explicitly. Something along the lines of having to write a filter for listing the output directory esp for cases where the output dir would be consumed by a subsequent MR job. I think it makes sense to have the filter be prepackaged as part of hadoop. That way every user doesn't have to implement the same thing. 4) The documentation should mention that the user should start a browser and connect to the host:port that he gets when he starts the viewer. Submiting patch with Devaraj's comments incorporated.
Along with the comments, This patch addresses couple of things: patch not in sync with trunk.
Submiting patch in sync with trunk.
Sorry, OutputLogFilter was not added to svn
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12373921/patch-2178.txt against trunk revision 614721. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1665/testReport/ This message is automatically generated. Sorry again, userLogDir == "none" should use equals method.
Since this is a 0.17 fix, I'm marking this "open" to get it out of our long patch build queue on Hudson. Please mark it "patch available" once we branch 0.16.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12374003/patch-2178.txt against trunk revision 615723. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac -1. The applied patch generated 608 javac compiler warnings (more than the trunk's current 607 warnings). release audit -1. The applied patch generated 212 release audit warnings (more than the trunk's current 207 warnings). findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1710/testReport/ This message is automatically generated. The test failure is not related to the patch.
The only release audit warning that matters is that src/java/org/apache/hadoop/mapred/OutputLogFilter.java is missing a license header. Also, did you look at the new javac warning?
canceling patch to fix hudson warnings
Submiting patch again. Now all the tests pass after fixing
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12374749/patch-2178.2.txt against trunk revision 616796. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit -1. The applied patch generated 179 release audit warnings (more than the trunk's current 175 warnings). findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1740/testReport/ This message is automatically generated. Sorry, this patch doesn't apply cleanly anymore. Please regenerate the patch.
Regenerated patch with trunk
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12375810/patch-2178.txt against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 19 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit -1. The applied patch generated 180 release audit warnings (more than the trunk's current 176 warnings). findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1812/testReport/ This message is automatically generated. Release audit warnings are due to jsp files.
I just committed this. Thanks, Amareshwari!
Integrated in Hadoop-trunk #406 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/406/
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I'm not a real fan of hidden directories like these. The user will not know of them and potentially will fill a lot of disk/name space with never viewed material. I'd be much happier if job history were considered part of the output of a job, unless configured otherwise. IE put it in the map-reduce output directory in a file or directories prefixed with an underscore. So <output>/_jobHistory or perhaps <output>/_logs/history.
We added the convention that map-reduce ignores underscore prefixed files specifically to allow this use case...
This also reduces jobid/name confusion, since the history is directly associated with the job's output.
We could then provide an option to put it in another location if the user desires.
thoughts?