Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1760

Crawl script fails to find job file if called from outside bin dir

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Duplicate
    • 1.8, 2.2.1
    • None
    • None
    • Ubuntu 13.10 Server

    • Patch Available

    Description

      The crawl script that comes with all the version of Nutch I have checked set the local/distributed operating mode using a relative path (i.e. "../nutch-.job").

      Bash seems to be taking this as relative to the location that the crawl script was called from, not the scripts actual location.

      The result is that the script thinks it is in local mode because it cannot find the job file. When trying to carry out a crawl jobs are submitted to Hadoop properly, but ifs that test for local (or not) mode fail and give strange results/result in crashes.

      Using the first bash snippet from here I have modified the crawl script to look for a job file relative to the script location on disk.

      I have attached a patch with my modifications.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            david.hosking David Hosking
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment