Log In
h
ome
b
rowse project
f
ind issues
Q
uick Search:
Learn more about
Quick Search
All Projects
:
Nutch
: fetcher
(Component)
Select:
Open Issues
Road Map
Change Log
Popular Issues
Open Issues
49
unresolved issue(s).
Versions
(with open issues due to be fixed per version for this component)
NUTCH-475
UNRESOLVED
Adaptive crawl delay
NUTCH-478
UNRESOLVED
Add function for stopping FetherThread gracefully
NUTCH-207
UNRESOLVED
Bandwidth target for fetcher rather than a thread count
NUTCH-496
UNRESOLVED
ConcurrentModificationException can be thrown when getSorted() is called.
NUTCH-289
UNRESOLVED
CrawlDatum should store IP address
NUTCH-629
UNRESOLVED
Detect slow and timeout servers and drop their URLs
NUTCH-755
UNRESOLVED
DomainURLFilter crashes on malformed URL
NUTCH-87
UNRESOLVED
Efficient site-specific crawling for a large number of sites
NUTCH-628
UNRESOLVED
Host database to keep track of host-level information
NUTCH-283
UNRESOLVED
If the Fetcher times out and abandons Fetcher Threads, severe errors will occur on those Threads
NUTCH-709
UNRESOLVED
JSParseFilter gets into an infinate loop and ets all the stack
NUTCH-649
UNRESOLVED
Log list of files found but not crawled.
NUTCH-714
UNRESOLVED
Need a SFTP and SCP Protocol Handler
NUTCH-424
UNRESOLVED
NekoHTML's DOMFragmentParser hangs on certain URLs (CLONE: Problem persists with Nutch 0.9 and 0.8.1 (Nekohtml 0.9.4))
NUTCH-753
UNRESOLVED
Prevent new Fetcher to retrieve the robots twice
NUTCH-460
UNRESOLVED
RDF parser plugin
NUTCH-644
UNRESOLVED
RTF parser doesn't compile anymore
NUTCH-119
UNRESOLVED
Regexp to extract outlinks incorrect
NUTCH-385
UNRESOLVED
Server delay feature conflicts with maxThreadsPerHost
NUTCH-751
UNRESOLVED
Upgrade version of HttpClient
NUTCH-185
UNRESOLVED
XMLParser is configurable xml parser plugin.
NUTCH-719
UNRESOLVED
fetchQueues.totalSize incorrect in Fetcher2
NUTCH-414
UNRESOLVED
parse-mp3 plugin concatenating previous tags for text field
NUTCH-409
UNRESOLVED
Add "short circuit" notion to filters to speedup mixed site/subsite crawling
NUTCH-740
UNRESOLVED
Configuration option to override default language for fetched pages.
NUTCH-490
UNRESOLVED
Extension point with filters for Neko HTML parser (with patch)
NUTCH-410
UNRESOLVED
Faster RegexNormalize with more features
NUTCH-84
UNRESOLVED
Fetcher for constrained crawls
NUTCH-363
UNRESOLVED
Fetcher normalizes everything at least twice
NUTCH-769
UNRESOLVED
Fetcher to skip queues for URLS getting repeated exceptions
NUTCH-49
UNRESOLVED
Flag for generate to fetch only new pages to complement the -refetchonly flag
NUTCH-13
UNRESOLVED
If dns points to 127.0.0.1, the url is also crawled
NUTCH-295
UNRESOLVED
More description for fetcher.threads.fetch property
NUTCH-158
UNRESOLVED
Process Sitemap data in text, rss or xml format as well as OAI-PMH
NUTCH-351
UNRESOLVED
Protocol forward proxy
NUTCH-569
UNRESOLVED
Protocol plugins should report progress to the fetcher
NUTCH-98
UNRESOLVED
RobotRulesParser interprets robots.txt incorrectly
NUTCH-566
UNRESOLVED
Sun's URL class has bug in creation of relative query URLs
NUTCH-18
UNRESOLVED
Windows servers include illegal characters in URLs
NUTCH-208
UNRESOLVED
http: proxy exception list:
NUTCH-705
UNRESOLVED
parse-rtf plugin
NUTCH-427
UNRESOLVED
protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
NUTCH-658
UNRESOLVED
Add Counter for # of doc fetched in Reporter
NUTCH-113
UNRESOLVED
Disable permanent DNS-to-IP caching for JVM 1.4
NUTCH-278
UNRESOLVED
Fetcher-status might need clarification: kbit/s instead of kb/s shown
NUTCH-182
UNRESOLVED
Log when db.max configuration limits reached
NUTCH-26
UNRESOLVED
New Http Authentication mechanism
NUTCH-100
UNRESOLVED
New plugin urlfilter-db
NUTCH-249
UNRESOLVED
black- white list url filtering
0.8.2
1
1.1
6
Unscheduled
42
Preset Filters
-
All
-
Outstanding
-
Unscheduled
-
Most important
-
Resolved recently
-
Added recently
-
Updated recently
Component Summary
Open
49
27%
Resolved
5
3%
Closed
126
70%
Open Issues
By Priority
Major
23
47%
Minor
19
39%
Trivial
7
14%
By Assignee
Chris A. Mattmann
2
4%
Dennis Kubes
2
4%
Otis Gospodnetic
2
4%
Sami Siren
1
2%
Unassigned
42
86%