Uploaded image for project: 'Droids'
  1. Droids
  2. DROIDS-77

Be able to modify URL rules while crawler is running

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • None
    • None
    • core
    • None

    Description

      It would be nice to be able to modify the URL rules while a crawler is running. This would allow me to dynamically exclude areas from being crawled based on results being returned. Basically I want to look for certain markers inside a page, then not crawl those pages without having update a robots file. Different paths of our site is going to enter into the index from a different method than the main crawl, so I can skip them once I find them.

      Having a modifiable filter would allow people to load their rules from places other than a file without having to write their own implementation or extension. I'll try to work up a patch sometime this week.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rfrovarp Richard Frovarp
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: