[NUTCH-2440] DbResource does not accept crawlid - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Auto Closed
Affects Version/s: 2.3, 2.4
Fix Version/s: 2.5
Component/s: REST_api
Labels:
None

Description

DbResource is initiating DbReaders with null crawlids. This blocks querying correct table/collection if crawlid is set during fetch.

For example in mongodb, by default all data is stored in "webpage" collection. Let say you set crawlid as "tech" for fetch, then all data gets stored in "tech_webpage" collection. But during rest call to /db end point, since you cannot specify crawlid, it will query "webpage" collection.

I am thinking either DBFilter can be changed to read in crawlid, or resource path can include crawlid. I am open to suggestions and then can make PR.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Tulay Muezzinoglu

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 12/Oct/17 00:12

Updated:: 13/Oct/19 22:35

Resolved:: 13/Oct/19 22:35