Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2331

REST API Fetch fails to retrieve HDFS path on distributed mode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.15
    • None
    • fetcher, REST_api
    • None

    Description

      Currently in the REST API, if the user does not specify the absolute path of the segment to fetch and only the crawlId, then the fetcher would find the latest segment generated and use that.

      But as of now, the above functionality will only work in local mode as per https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/fetcher/Fetcher.java#L562-L573.

      Need to update these lines to enable fetcher to read the directory and list files from an hdfs system.

      Attachments

        Activity

          People

            sujenshah Sujen Shah
            sujenshah Sujen Shah
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: