Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2524

bin/crawl: fix check for HostDb in distributed mode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.15
    • 1.15
    • bin
    • None

    Description

      In crawl script you can find something likeĀ 
      if [[ -d "$CRAWL_PATH"/hostdb ]]; then
      echo "Processing sitemaps based on hosts in HostDB"
      __bin_nutch sitemap "$CRAWL_PATH"/crawldb -hostdb "$CRAWL_PATH"/hostdb -threads $NUM_THREADS
      fi

      if [[ -d "$CRAWL_PATH"/hostdb ]]; doesnt work for HDFS only for local mode.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              semyon.semyonov@mail.com Semyon Semyonov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: