Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-3000

protocol-selenium returns only the body,strips off the <head/> element

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.20
    • protocol
    • None

    Description

      The selenium protocol returns only the body portion of the html, which means that neither the title nor the other page metadata in the <head/> section gets extracted.

      String innerHtml = driver.findElement(By.tagName("body"))
                              .getAttribute("innerHTML");
      

      We should return the full html, no?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tallison Tim Allison
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: