Uploaded image for project: 'ManifoldCF'
  1. ManifoldCF
  2. CONNECTORS-854

Enable STALE_CONNECTION_CHECK

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: ManifoldCF 1.4.1
    • Fix Version/s: ManifoldCF 1.5
    • Component/s: Web connector
    • Labels:
      None

      Description

      When crawling some sites( < 1000 docs), sometimes manifoldcf.log shows the following "The target server failed to respond" messages. It seems that NoHttpResponseException is thrown at ThrottledFetcher.

       WARN 2014-01-09 12:39:16,701 (Worker thread '10') - Pre-ingest service interruption reported for job 1389238470356 connection '1': Timed out waiting for response for 'http://www.rondhuit.com/?p=1890': The target server failed to respond
       WARN 2014-01-09 12:39:55,509 (Worker thread '7') - Pre-ingest service interruption reported for job 1389238470356 connection '1': Timed out waiting for response for 'http://www.rondhuit.com/?p=675': The target server failed to respond
      

      The fetching that page after retry time(15 minutes) passed was running successfully.

      I tried to change a httpclient configuration then I confirmed that massage was not shown.

      +++ connectors/webcrawler/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/webcrawler/ThrottledFetcher.java
      @@ -463,7 +463,7 @@
               BasicHttpParams params = new BasicHttpParams();
               params.setParameter(ClientPNames.DEFAULT_HOST,fetchHost);
               params.setBooleanParameter(CoreConnectionPNames.TCP_NODELAY,true);
      -        params.setBooleanParameter(CoreConnectionPNames.STALE_CONNECTION_CHECK,false);
      +        params.setBooleanParameter(CoreConnectionPNames.STALE_CONNECTION_CHECK,true);
               params.setBooleanParameter(ClientPNames.ALLOW_CIRCULAR_REDIRECTS,true);
      

      I know two users who are hitting this issue and have resolved it by turning on stale connection check.
      The crawling job is done more quickly than the check is false because there are not retry fetches.

      May I switch false to true in stale connection check as well as SolrConnector's httpclient configuration?

        Attachments

          Activity

            People

            • Assignee:
              kwright@metacarta.com Karl Wright
              Reporter:
              shinichiro abe Shinichiro Abe
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: