[NUTCH-1518] session cookies support - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.1
Fix Version/s: 2.2
Component/s: fetcher
Labels:
None

Description

There are internet sites, which in order to get crawled have to store cookies.
For example for a fake internet site
www.blala.com in order to fetch we have to send first
www.blala.com?username=x&password=y
and then the browser stores a sesssion cookie.
Afterwards i can fetch all pages of the domain www.blala.com/a/b/c.html
I want a feature where we define in a file domains of urls and the url how to make logins.
When nutch will see such a site and fetch the page it will login before.
The jira https://issues.apache.org/jira/browse/NUTCH-827
is very similar to what i need.

Attachments

Issue Links

duplicates

NUTCH-827 HTTP POST Authentication

Closed

Activity

People

Assignee:: Unassigned

Reporter:: David Michael Gang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 15/Jan/13 07:58

Updated:: 15/May/14 09:44

Resolved:: 15/Jan/13 20:20