Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1929

Consider implementing dependency injection for crawl HTTPS sites that use self signed certificates

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      It was mentioned a while ago that "to be able to crawl sites with a self signed certificate required a simple code modification the protocol-httpclient plugin."

      in org.apache.nutch.protocol.httpclient.Http
      
      Replace:
      
      ProtocolSocketFactory factory = new SSLProtocolSocketFactory();
      
      With:
      
      ProtocolSocketFactory factory = new DummySSLProtocolSocketFactory();
      

      I can confirm that this patch actually fixes the issue, however the thread hangs on a question which was never answered.

      "Is there dependency injection that can be used?"

      This issue needs to investigate the required logic which we can implement to make the decision at runtime.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                lewismc Lewis John McGibbney
                Reporter:
                lewismc Lewis John McGibbney
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: