Uploaded image for project: 'Droids'
  1. Droids
  2. DROIDS-58

Implement a filter mechanism that allow intecepting every stage of a crawling process

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.1.0
    • 0.3.0
    • None
    • None

    Description

      refer to this:
      http://mail-archives.apache.org/mod_mbox/incubator-droids-dev/200906.mbox/%3Cbc1833ba0906250438t1464d1a0jd62cc4790b663938@mail.gmail.com%3E

      assume the process is
      1. poll a link from queue
      2. fetch entity
      3. parse entity
      4. extract outlinks

      we provide a mechanism to intercept the process in every stage. e.g. a LinkFilter has a "public T polled(T link);" interface, any filter may reject or transform a Link polled from the queue. similar logic applies to fetching, parsing, and extracting (outlink)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mingfai Mingfai Ma
              Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: