Description
This is minor - but it's a little unclean
Reproduce: Have a URL-file with one URL followed by a newline, thus producing an empty line.
Outcome: Fetcher-threads try to fetch two URLs at the same time. First one is fine - but second is empty and therefor fails proper protocol-detection.
60521 022639 Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer)
060521 022639 Nutch Query Filter (org.apache.nutch.searcher.QueryFilter)
060521 022639 found resource parse-plugins.xml at file:/home/mm/nutch-nightly/conf/parse-plugins.xml
060521 022639 Using URL normalizer: org.apache.nutch.net.BasicUrlNormalizer
060521 022639 fetching http://www.bild.de/
060521 022639 fetching
060521 022639 fetch of failed with: org.apache.nutch.protocol.ProtocolNotFound: java.net.MalformedURLException: no protocol:
060521 022639 http.proxy.host = null
060521 022639 http.proxy.port = 8080
060521 022639 http.timeout = 10000
060521 022639 http.content.limit = 65536
060521 022639 http.agent = NutchCVS/0.8-dev (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)
060521 022639 fetcher.server.delay = 1000
060521 022639 http.max.delays = 1000
060521 022640 ParserFactory:Plugin: org.apache.nutch.parse.text.TextParser mapped to contentType text/xml via parse-plugins.xml, but
its plugin.xml file does not claim to support contentType: text/xml
060521 022640 ParserFactory:Plugin: org.apache.nutch.parse.html.HtmlParser mapped to contentType text/xml via parse-plugins.xml, but
its plugin.xml file does not claim to support contentType: text/xml
060521 022640 ParserFactory: Plugin: org.apache.nutch.parse.rss.RSSParser mapped to contentType text/xml via parse-plugins.xml, but
not enabled via plugin.includes in nutch-default.xml
060521 022640 Using Signature impl: org.apache.nutch.crawl.MD5Signature
060521 022640 map 0% reduce 0%
060521 022640 1 pages, 1 errors, 1.0 pages/s, 40 kb/s,
060521 022640 1 pages, 1 errors, 1.0 pages/s, 40 kb/s,