Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2653

ProtocolFactory.getProtocol(url) creates separate plugin instances for http/https

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.15
    • Fix Version/s: 1.16
    • Component/s: fetcher, protocol
    • Labels:
      None

      Description

      Fetcher creates two instances of the protocol-okhttp plugin, one to handle http requests, another for https. The plugin properties are logged during plugin instantiation when calling setConf(...):

      2018-10-11 13:28:34,417 INFO [FetcherThread] org.apache.nutch.fetcher.FetcherThread: FetcherThread 40 fetching http://...
      ...
      2018-10-11 13:28:35,099 INFO [FetcherThread] org.apache.nutch.protocol.okhttp.OkHttp: http.proxy.host = null
      2018-10-11 13:28:35,100 INFO [FetcherThread] org.apache.nutch.protocol.okhttp.OkHttp: http.proxy.port = 8080
      ...
      2018-10-11 13:28:36,864 INFO [FetcherThread] org.apache.nutch.fetcher.FetcherThread: FetcherThread 87 fetching https://...
      ...
      2018-10-11 13:28:36,864 INFO [FetcherThread] org.apache.nutch.protocol.okhttp.OkHttp: http.proxy.host = null
      2018-10-11 13:28:36,864 INFO [FetcherThread] org.apache.nutch.protocol.okhttp.OkHttp: http.proxy.port = 8080
      

      The question is whether this is the correct behavior for plugins supporting multiple protocols (http and https)? It may cause that connection pooling and other network optimizations do not work as expected. Of course, it's correct if different plugins are required, e.g., for ftp or the local file system.

      (seen while reviewing the behavior of fetcher with fix for NUTCH-2625 applied)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                snagel Sebastian Nagel
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: