History
Log In
h
ome
b
rowse project
f
ind issues
Q
uick Search:
Learn more about
Quick Search
Filter:
View
Edit
New
Manage
You are currently using a new, unsaved search.
Summary
Project:
Nutch
Components:
fetcher
Resolutions:
Unresolved
Sorted by:
Priority descending
Operations
Save
Issue Navigator
[
Permlink
]
Displaying issues
1
to
49
of
49
matching issues.
Current View:
Browser
(
Current Fields
|
Printable
|
Full Content
)
|
XML
| RSS
(
Issues
|
Comments
)
|
Word
| Excel
(
All fields
|
Current fields
)
T
Patch Info
Key
Summary
Assignee
Reporter
Pr
Status
Res
Created
Updated
Due
NUTCH-87
Efficient site-specific crawling for a large number of sites
Unassigned
AJ Chen
Open
UNRESOLVED
03/Sep/05
20/Jan/06
NUTCH-119
Regexp to extract outlinks incorrect
Unassigned
Sébastien Le Callonnec
Open
UNRESOLVED
21/Oct/05
21/Oct/05
NUTCH-207
Bandwidth target for fetcher rather than a thread count
Unassigned
Rod Taylor
Open
UNRESOLVED
08/Feb/06
04/Dec/08
NUTCH-719
fetchQueues.totalSize incorrect in Fetcher2
Unassigned
Julien Nioche
Open
UNRESOLVED
12/Mar/09
13/Jul/09
NUTCH-478
Add function for stopping FetherThread gracefully
Unassigned
chee.wu
Open
UNRESOLVED
05/May/07
05/May/07
NUTCH-424
NekoHTML's DOMFragmentParser hangs on certain URLs (CLONE: Problem persists with Nutch 0.9 and 0.8.1 (Nekohtml 0.9.4))
Unassigned
Karsten Dello
Open
UNRESOLVED
01/Jan/07
10/May/07
NUTCH-283
If the Fetcher times out and abandons Fetcher Threads, severe errors will occur on those Threads
Unassigned
Scott Ganyo
Open
UNRESOLVED
25/May/06
31/May/06
NUTCH-385
Server delay feature conflicts with maxThreadsPerHost
Unassigned
Chris Schneider
Open
UNRESOLVED
11/Oct/06
24/Oct/06
NUTCH-414
parse-mp3 plugin concatenating previous tags for text field
Unassigned
Brian Whitman
Open
UNRESOLVED
12/Dec/06
12/Dec/06
NUTCH-753
Prevent new Fetcher to retrieve the robots twice
Unassigned
Julien Nioche
Open
UNRESOLVED
07/Sep/09
07/Sep/09
NUTCH-475
Adaptive crawl delay
Unassigned
Doğacan Güney
Open
UNRESOLVED
25/Apr/07
20/Feb/09
NUTCH-289
CrawlDatum should store IP address
Unassigned
Doug Cutting
Open
UNRESOLVED
27/May/06
27/Jun/07
NUTCH-496
ConcurrentModificationException can be thrown when getSorted() is called.
Unassigned
Briggs
Open
UNRESOLVED
04/Jun/07
05/Jun/07
NUTCH-751
Upgrade version of HttpClient
Unassigned
Julien Nioche
Open
UNRESOLVED
04/Sep/09
09/Sep/09
NUTCH-649
Log list of files found but not crawled.
Unassigned
Jim
Open
UNRESOLVED
28/Aug/08
28/Aug/08
Patch Available
NUTCH-629
Detect slow and timeout servers and drop their URLs
Otis Gospodnetic
Otis Gospodnetic
Open
UNRESOLVED
12/Apr/08
21/May/08
NUTCH-460
RDF parser plugin
Unassigned
Ricardo J. Méndez
Open
UNRESOLVED
17/Mar/07
20/Feb/09
Patch Available
NUTCH-644
RTF parser doesn't compile anymore
Unassigned
Guillaume Smet
Open
UNRESOLVED
08/Aug/08
27/Feb/09
NUTCH-755
DomainURLFilter crashes on malformed URL
Unassigned
Mike Baranczak
Open
UNRESOLVED
17/Sep/09
26/Oct/09
NUTCH-628
Host database to keep track of host-level information
Unassigned
Otis Gospodnetic
Open
UNRESOLVED
12/Apr/08
28/Jan/09
NUTCH-185
XMLParser is configurable xml parser plugin.
Chris A. Mattmann
Rida Benjelloun
Open
UNRESOLVED
25/Jan/06
27/Feb/09
NUTCH-714
Need a SFTP and SCP Protocol Handler
Chris A. Mattmann
Sanjoy Ghosh
Open
UNRESOLVED
10/Mar/09
24/Mar/09
NUTCH-709
JSParseFilter gets into an infinate loop and ets all the stack
Unassigned
Tim Hawkins
Open
UNRESOLVED
03/Mar/09
07/Jun/09
NUTCH-49
Flag for generate to fetch only new pages to complement the -refetchonly flag
Unassigned
Luke Baker
Open
UNRESOLVED
21/Apr/05
25/Oct/05
NUTCH-18
Windows servers include illegal characters in URLs
Unassigned
Stefan Groschupf
Open
UNRESOLVED
26/Mar/05
27/Apr/06
NUTCH-13
If dns points to 127.0.0.1, the url is also crawled
Unassigned
Matthias Jaekle
Open
UNRESOLVED
19/Mar/05
22/Apr/05
NUTCH-427
protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
Unassigned
Armel Nene
Open
UNRESOLVED
05/Jan/07
08/Nov/08
NUTCH-84
Fetcher for constrained crawls
Unassigned
Kelvin Tan
Open
UNRESOLVED
25/Aug/05
27/Aug/05
NUTCH-98
RobotRulesParser interprets robots.txt incorrectly
Unassigned
Jeff Bowden
Open
UNRESOLVED
29/Sep/05
04/Dec/05
NUTCH-158
Process Sitemap data in text, rss or xml format as well as OAI-PMH
Unassigned
byron miller
Open
UNRESOLVED
30/Dec/05
08/Feb/06
NUTCH-208
http: proxy exception list:
Unassigned
Matthias Günter
Open
UNRESOLVED
09/Feb/06
07/Sep/06
NUTCH-351
Protocol forward proxy
Sami Siren
Sami Siren
Open
UNRESOLVED
17/Aug/06
17/Feb/09
NUTCH-295
More description for fetcher.threads.fetch property
Dennis Kubes
Dennis Kubes
Open
UNRESOLVED
02/Jun/06
31/Mar/08
NUTCH-490
Extension point with filters for Neko HTML parser (with patch)
Unassigned
Marcin Okraszewski
Open
UNRESOLVED
22/May/07
27/May/09
NUTCH-409
Add "short circuit" notion to filters to speedup mixed site/subsite crawling
Unassigned
Doug Cook
Open
UNRESOLVED
26/Nov/06
26/Nov/06
NUTCH-410
Faster RegexNormalize with more features
Unassigned
Doug Cook
Open
UNRESOLVED
29/Nov/06
29/Nov/06
NUTCH-566
Sun's URL class has bug in creation of relative query URLs
Unassigned
Doug Cook
Open
UNRESOLVED
10/Oct/07
14/Mar/08
NUTCH-569
Protocol plugins should report progress to the fetcher
Unassigned
Andrzej Bialecki
Open
UNRESOLVED
23/Oct/07
23/Oct/07
NUTCH-363
Fetcher normalizes everything at least twice
Unassigned
Doug Cook
Open
UNRESOLVED
08/Sep/06
16/Jan/08
Patch Available
NUTCH-705
parse-rtf plugin
Unassigned
Dmitry Lihachev
Open
UNRESOLVED
27/Feb/09
10/Mar/09
Patch Available
NUTCH-740
Configuration option to override default language for fetched pages.
Otis Gospodnetic
Marcin Okraszewski
Open
UNRESOLVED
28/May/09
09/Jun/09
Patch Available
NUTCH-769
Fetcher to skip queues for URLS getting repeated exceptions
Unassigned
Julien Nioche
Open
UNRESOLVED
23/Nov/09
23/Nov/09
NUTCH-26
New Http Authentication mechanism
Unassigned
Stefan Groschupf
Open
UNRESOLVED
26/Mar/05
04/Apr/05
NUTCH-100
New plugin urlfilter-db
Unassigned
Gal Nitzan
Open
UNRESOLVED
30/Sep/05
25/Feb/06
NUTCH-113
Disable permanent DNS-to-IP caching for JVM 1.4
Unassigned
Fuad Efendi
Open
UNRESOLVED
16/Oct/05
16/Oct/05
NUTCH-182
Log when db.max configuration limits reached
Unassigned
Matt Kangas
Open
UNRESOLVED
20/Jan/06
20/Jan/06
NUTCH-278
Fetcher-status might need clarification: kbit/s instead of kb/s shown
Unassigned
Stefan Neufeind
Open
UNRESOLVED
22/May/06
22/May/06
Patch Available
NUTCH-658
Add Counter for # of doc fetched in Reporter
Unassigned
Julien Nioche
Open
UNRESOLVED
31/Oct/08
08/Dec/08
NUTCH-249
black- white list url filtering
Dennis Kubes
Stefan Groschupf
Open
UNRESOLVED
17/Apr/06
29/Jul/09