Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-3001

protocol-selenium requires Content-Type header

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 1.20
    • None
    • None

    Description

      It looks like the selenium protocol requires that there be a content-type header.

      The logic seems to be: If the content type is html or xhtml, use selenium, otherwise just grab the bytes.

      However, with the current logic, if the content-type is null, nothing is pulled.

      My guess is that the logic should be : if the content type is not null and equals html or xhtml use selenium, otherwise grab the bytes.

      Right?

            String contentType = getHeader(Response.CONTENT_TYPE);
      
            // handle with Selenium only if content type in HTML or XHTML
            if (contentType != null) {
               if (contentType.contains("text/html")
                  || contentType.contains("application/xhtml")) {
                     readPlainContent(url);
               } else {
      ...
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tallison Tim Allison
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: