Issue Details (XML | Word | Printable)

Key: NUTCH-318
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Critical Critical
Assignee: Sami Siren
Reporter: Stefan Groschupf
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Nutch

log4j not proper configured, readdb doesnt give any information

Created: 11/Jul/06 02:05 AM   Updated: 24/Sep/06 03:30 PM
Return to search
Component/s: None
Affects Version/s: 0.8
Fix Version/s: 0.8.1, 0.9.0

Time Tracking:
Not Specified

Resolution Date: 01/Aug/06 04:25 PM


 Description  « Hide
In the latest .8 sources the readdb command doesn't dump any information anymore.
This is realeated to the miss configured log4j.properties file.
changing:
log4j.rootLogger=INFO,DRFA
to:
log4j.rootLogger=INFO,DRFA,stdout
dumps the information to the console, but not in a nice way.

What makes me wonder is that these information should be also in the log file, but the arn't, so there are may be even here problems.
Also what is the different between hadoop-XXX-jobtracker-XXX.out and hadoop-XXX-jobtracker-XXX.log ?? Shouldn't there just one of them?



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Stefan Groschupf added a comment - 25/Jul/06 08:15 PM
Shouldn't that be fixed in .8 since by today this tool just produce no output?!

Sami Siren added a comment - 26/Jul/06 05:59 AM
Perhaps this is happening in distributed setup? in 1 machine setup output is done to log file see NUTCH-315

Stefan Groschupf added a comment - 26/Jul/06 06:26 AM
Yes this happens only in a distributed environment. Please also see my last mail in the hadoop dev list. I think there are more general logging problems, that only occurs in a distributed environment. So you will not track them down using local runner.

Andrzej Bialecki added a comment - 26/Jul/06 06:36 AM
I think also that producing no output on the console is confusing to new users, especially in the "local" mode. It's not immediately obvious where to look for results of your commands, especially commands like 'readdb -stats', which users naturally expect to produce some output on the console.

Sami Siren added a comment - 26/Jul/06 06:42 AM
I agree so the next thing to do is change readdb -stats to print to stdout, i'll go ahead and do that. Are there any other commands dicovered to be changed in similoar way?

Andrzej Bialecki added a comment - 26/Jul/06 06:57 AM
Ok, go ahead, it makes sense in this case. However, there are many places where displaying INFO messages on the console also makes sense, again - especially in the local mode. Otherwise you have to start a new console, and do a 'tail -f logs/hadoop.log', which seems like an awkward and complicated way to see if your command makes any progress or is stuck.

So, I would vote for adding stdout for now. I know it produces messy output, and a lot of it - perhaps we could exclude some stuff from stdout, like plugin repository loading, config parsing etc. But my feeling is that at least some info should be displayed on the console.


Sami Siren added a comment - 26/Jul/06 07:12 AM
could this be solved by just adding folowing line into conf/log4j.properties?

log4j.logger.org.apache.nutch.crawl.CrawlDbReader=INFO,stdout

for me it produces following output to stdout:
bin/nutch readdb ../nutch-0.8-release/crawl/crawldb -stats
2006-07-26 10:09:28,839 INFO crawl.CrawlDbReader (CrawlDbReader.java:processStatJob(210)) - CrawlDb statistics start: ../nutch-0.8-release/crawl/crawldb
2006-07-26 10:09:31,203 INFO crawl.CrawlDbReader (CrawlDbReader.java:processStatJob(261)) - Statistics for CrawlDb: ../nutch-0.8-release/crawl/crawldb
2006-07-26 10:09:31,204 INFO crawl.CrawlDbReader (CrawlDbReader.java:processStatJob(277)) - TOTAL urls: 60
2006-07-26 10:09:31,206 INFO crawl.CrawlDbReader (CrawlDbReader.java:processStatJob(272)) - avg score: 1.015
2006-07-26 10:09:31,206 INFO crawl.CrawlDbReader (CrawlDbReader.java:processStatJob(270)) - max score: 1.103
2006-07-26 10:09:31,208 INFO crawl.CrawlDbReader (CrawlDbReader.java:processStatJob(268)) - min score: 1.012
2006-07-26 10:09:31,209 INFO crawl.CrawlDbReader (CrawlDbReader.java:processStatJob(277)) - retry 0: 60
2006-07-26 10:09:31,209 INFO crawl.CrawlDbReader (CrawlDbReader.java:processStatJob(276)) - status 1 (DB_unfetched): 59
2006-07-26 10:09:31,211 INFO crawl.CrawlDbReader (CrawlDbReader.java:processStatJob(276)) - status 2 (DB_fetched): 1
2006-07-26 10:09:31,212 INFO crawl.CrawlDbReader (CrawlDbReader.java:processStatJob(282)) - CrawlDb statistics: done

of course it would look more nice if we create another format for such cases also (perhaps remove some unneccessary info)


Sami Siren added a comment - 26/Jul/06 08:46 AM
i just committed some changes to log4j configuration for some command line tools to trunk, is this satisfactory solution to this problem from Nutch's side?

http://svn.apache.org/viewvc/lucene/nutch/trunk/conf/log4j.properties?r1=416100&r2=425675&diff_format=h


Andrzej Bialecki added a comment - 26/Jul/06 09:17 AM
Works for me - no further complaints on my side. Thanks!

Sami Siren added a comment - 01/Aug/06 04:25 PM
marking this as resolved because it is now working ok in single node config.