Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1518

session cookies support

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.1
    • Fix Version/s: 2.2
    • Component/s: fetcher
    • Labels:
      None

      Description

      There are internet sites, which in order to get crawled have to store cookies.
      For example for a fake internet site
      www.blala.com in order to fetch we have to send first
      www.blala.com?username=x&password=y
      and then the browser stores a sesssion cookie.
      Afterwards i can fetch all pages of the domain www.blala.com/a/b/c.html
      I want a feature where we define in a file domains of urls and the url how to make logins.
      When nutch will see such a site and fetch the page it will login before.
      The jira https://issues.apache.org/jira/browse/NUTCH-827
      is very similar to what i need.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                davidga David Michael Gang
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: