Uploaded image for project: 'Droids'
  1. Droids
  2. DROIDS-72

Doesn't honor base element

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Graduating from the Incubator
    • 0.1.0
    • core
    • None

    Description

      The HtmlParser and LinkExtractor do not honor the base element in HTML. This will make crawling of some sites impossible. LinkExtractor and HtmlParser should be able to be given a element/attribute pair to look for a base URI.

      Attachments

        1. read-base-uri.patch
          4 kB
          Richard Frovarp
        2. read-base-uri-2.patch
          2 kB
          Richard Frovarp

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rfrovarp Richard Frovarp
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: