[NUTCH-1452] hadoop.job.history.user.location in nutch-default making job history useless - ASF JIRA

Voters

Watch issue

Watchers

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Auto Closed
Affects Version/s: None
Fix Version/s: 2.5
Component/s: None
Labels:
None

Description

There is still a property in nutch-default 'hadoop.job.history.user.location' that redirects the creation of history files from job output locations to a custom location. I noticed that the current value does not work well with cloudera (I have tested cdh3u4), because ${hadoop.log.dir} is not defined. This actually causes the job in the jobtracker to show empty info. (With 'incomplete' job status). This is only when the job moves to retired. When it is still in 'completed', all is looking well.

This property can be set to 'none', because the job history is ALSO stored in the central jobtracker location anyway. The 'hadoop.job.history.user.location' property specifies an extra location. But if it is set to an invalid value, it causes the central history location to NOT store it, so it seems. Please see for more details:
http://hadoop.apache.org/common/docs/r1.0.3/cluster_setup.html

Besides setting it to 'none', another option is to set it to 'history' which does work with cdh. (This writes all logs to 'history' in the user directory in the configured filesystem, usually dfs). The final option is to simply remove this value and not meddle with hadoop properties at all. But that actually requires all jobs to correctly ignore these files. I am not up to date how well this currently works with Nutch jobs. This question is most relevant for trunk, since trunk heavily relies on the filesystem for jobs.

What do you think?
A) Set property to 'none'
B) Set property to 'history'
C) Remove property, see what happens, possibly fix jobs
D) ?

For now, I opt for A. But I think we need some more input with other distributions (for example official Hadoop 1.x) and also Nutch trunk.

Attachments

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Unassigned

Reporter:: Ferdy

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 14/Aug/12 08:30

Updated:: 13/Oct/19 22:35

Resolved:: 13/Oct/19 22:35

Agile

View on Board

hadoop.job.history.user.location in nutch-default making job history useless