Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2579

Fetcher to use parsed URL to call ProtocolFactory.getProtocol(url)

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Implemented
    • Affects Version/s: 1.14
    • Fix Version/s: 1.15
    • Component/s: fetcher, protocol
    • Labels:
      None

      Description

      The call of ProtocolFactory.getProtocol(url) is synchronized and causes waits for the lock in a multi-threaded fetcher. It uses the URL string, although it would be more efficient to use the parsed URL hold in the FetchItem. The lock could be released faster. In addition, parsing the URL also causes a lock in the URL stream handler:

      "FetcherThread" #37 daemon prio=5 os_prio=0 tid=0x00007f21edea2000 nid=0x5c20 waiting for monitor entry [0x00007f21bacb4000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at java.util.Hashtable.get(Hashtable.java:363)
              - waiting to lock <0x00000005e01b5840> (a java.util.Hashtable)
              at java.net.URL.getURLStreamHandler(URL.java:1135)
              at java.net.URL.<init>(URL.java:599)
              at java.net.URL.<init>(URL.java:490)
              at java.net.URL.<init>(URL.java:439)
              at org.apache.nutch.protocol.ProtocolFactory.getProtocol(ProtocolFactory.java:74)
              - locked <0x00000005fc5f4fb8> (a org.apache.nutch.protocol.ProtocolFactory)
              at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:299)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                wastl-nagel Sebastian Nagel
                Reporter:
                wastl-nagel Sebastian Nagel
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: