Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-413

Fetcher ignores -noParsing command line option

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.8.1
    • 1.0.0
    • fetcher
    • None
    • Fedora Core 6, nutch 0.8.1

    Description

      I believe that the patch applied in NUTCH-337 broke the fetcher. Now the fetcher ignores the -noParsing command-line option - the parsing occurs anyway.
      To the best of my understanding of nutch, I managed to trace the problem as follows in the code:

      In fetcher class, in line 473, -noParsing is evaluted properly and placed into a Configuration created by NutchConfiguartion.create(). So far so good.

      In the same file, in line 280, the decision whether to parse or not depends on local field "parsing". During execution, this fields value is true, instead of false. This field is set to true by method "configure", in line 357. The problem is that method "configure" accepts a JobConf as a parameter, but the actual JobConf object that is passed to it is not the one used previously in line 473.
      The one that is actually passed to configure is a different object. I think it is created in line 422, but I am not sure about it.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jonathan_amir Jonathan Amir
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: