Issue Details (XML | Word | Printable)

Key: NUTCH-413
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Unassigned
Reporter: Jonathan Amir
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Nutch

Fetcher ignores -noParsing command line option

Created: 07/Dec/06 11:10 PM   Updated: 10/Apr/09 12:29 PM
Return to search
Component/s: fetcher
Affects Version/s: 0.8.1
Fix Version/s: 1.0.0

Time Tracking:
Not Specified

Environment: Fedora Core 6, nutch 0.8.1

Resolution Date: 22/Sep/08 04:20 PM


 Description  « Hide
I believe that the patch applied in NUTCH-337 broke the fetcher. Now the fetcher ignores the -noParsing command-line option - the parsing occurs anyway.
To the best of my understanding of nutch, I managed to trace the problem as follows in the code:

In fetcher class, in line 473, -noParsing is evaluted properly and placed into a Configuration created by NutchConfiguartion.create(). So far so good.

In the same file, in line 280, the decision whether to parse or not depends on local field "parsing". During execution, this fields value is true, instead of false. This field is set to true by method "configure", in line 357. The problem is that method "configure" accepts a JobConf as a parameter, but the actual JobConf object that is passed to it is not the one used previously in line 473.
The one that is actually passed to configure is a different object. I think it is created in line 422, but I am not sure about it.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Doğacan Güney added a comment - 08/Dec/06 01:59 PM
Are you sure about this? Running the fetcher (latest trunk) with -noParsing option does not create any parse segments, while running fetcher without it does create them. I even put fetcher.parse property in nutch-site.xml(assuming that nutch-site overrides command line options), still it works as expected.

Jonathan Amir added a comment - 08/Dec/06 03:11 PM
I didn't check out the trunk, I checked out the 0.8.1 tag, because I wanted stability. If it is fixed in the trunk, then I guess you can close this issue.
By the way, I wouldn't assume that nutch-site overrides command line options - if it does, then it is wrong. It should be the other way around - command line options should override nutch-site.

Doğacan Güney added a comment - 08/Dec/06 08:15 PM
About command-line options: that is not what I meant(I am not a native speaker). I meant that I also set fetcher.parse to true in nutch-site too to see if there is a bug in that code.

Andrzej Bialecki added a comment - 22/Sep/08 04:20 PM
This is indeed already fixed in trunk.