[NUTCH-2331] REST API Fetch fails to retrieve HDFS path on distributed mode - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.15
Fix Version/s: None
Component/s: fetcher, REST_api
Labels:
None

Description

Currently in the REST API, if the user does not specify the absolute path of the segment to fetch and only the crawlId, then the fetcher would find the latest segment generated and use that.

But as of now, the above functionality will only work in local mode as per https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/fetcher/Fetcher.java#L562-L573.

Need to update these lines to enable fetcher to read the directory and list files from an hdfs system.

Attachments

Activity

People

Assignee:: Sujen Shah

Reporter:: Sujen Shah

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 20/Oct/16 22:40

Updated:: 22/Nov/19 13:22