Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1518

session cookies support

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 2.1
    • 2.2
    • fetcher
    • None

    Description

      There are internet sites, which in order to get crawled have to store cookies.
      For example for a fake internet site
      www.blala.com in order to fetch we have to send first
      www.blala.com?username=x&password=y
      and then the browser stores a sesssion cookie.
      Afterwards i can fetch all pages of the domain www.blala.com/a/b/c.html
      I want a feature where we define in a file domains of urls and the url how to make logins.
      When nutch will see such a site and fetch the page it will login before.
      The jira https://issues.apache.org/jira/browse/NUTCH-827
      is very similar to what i need.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              davidga David Michael Gang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: