History
Log In
h
ome
b
rowse project
f
ind issues
Q
uick Search:
Learn more about
Quick Search
All Projects
: Nutch
(Key: NUTCH)
Project Lead:
Andrzej Bialecki
URL:
http://lucene.apache.org/nutch/
Release Notes
Select:
Open Issues
Road Map
Change Log
Popular Issues
Subversion Commits
Releases
Versions
Components
Popular Issues
(shows the unresolved issues sorted by number of votes, with fix-for versions)
10
NUTCH-48
"Did you mean" query enhancement/refignment feature request
10
NUTCH-251
Administration GUI
1.1
9
NUTCH-650
Hbase Integration
1.1
6
NUTCH-185
XMLParser is configurable xml parser plugin.
5
NUTCH-92
DistributedSearch incorrectly scores results
5
NUTCH-289
CrawlDatum should store IP address
5
NUTCH-36
Chinese in Nutch
3
NUTCH-87
Efficient site-specific crawling for a large number of sites
3
NUTCH-710
Support for rel="canonical" attribute
1.1
3
NUTCH-296
Image Search
3
NUTCH-364
Javascript parser creates some fairly bogus URLs
3
NUTCH-477
Extend URLFilters to support different filtering chains
1.1
2
NUTCH-16
boost documents matching a url pattern
2
NUTCH-62
Add html META tag information into metaData in index-more plugin
2
NUTCH-84
Fetcher for constrained crawls
2
NUTCH-100
New plugin urlfilter-db
2
NUTCH-719
fetchQueues.totalSize incorrect in Fetcher2
2
NUTCH-570
Improvement of URL Ordering in Generator.java
2
NUTCH-377
Add possibility to search for multiple values
2
NUTCH-356
Plugin repository cache can lead to memory leak
1
NUTCH-50
Benchmarks & Performance goals
1
NUTCH-79
Fault tolerant searching.
1
NUTCH-412
plugin to parse the feed-url (rss/atom) of a blog
1
NUTCH-144
corrupt language identifier tri files and bad language recognition for german
1
NUTCH-158
Process Sitemap data in text, rss or xml format as well as OAI-PMH
1
NUTCH-215
Plugin execution order
1
NUTCH-389
a url tokenizer implementation for tokenizing index fields : url and host
1
NUTCH-281
cached.jsp: base-href needs to be outside comments
1
NUTCH-424
NekoHTML's DOMFragmentParser hangs on certain URLs (CLONE: Problem persists with Nutch 0.9 and 0.8.1 (Nekohtml 0.9.4))
1
NUTCH-540
some problem about the Nutch cache
1.1
1
NUTCH-278
Fetcher-status might need clarification: kbit/s instead of kb/s shown
1
NUTCH-283
If the Fetcher times out and abandons Fetcher Threads, severe errors will occur on those Threads
1
NUTCH-290
parse-pdf: Garbage indexed when text-extraction not allowed
1
NUTCH-326
WordExtractor throws java.util.NoSuchElementException on some documents
1
NUTCH-342
Nutch commands log to nutch/logs/hadoop.logs by default
1
NUTCH-346
Improve readability of logs/hadoop.log
1
NUTCH-355
The title of query result could like the summary have the highlight??
1
NUTCH-224
Nutch doesn't handle Korean text at all
1
NUTCH-366
Merge URLFilters and URLNormalizers
1
NUTCH-86
LanguageIdentifier API enhancements
1
NUTCH-475
Adaptive crawl delay
1.1
1
NUTCH-566
Sun's URL class has bug in creation of relative query URLs
1
NUTCH-568
Indexer does not update the Lucene "TITLE" field
1
NUTCH-558
Need tool to retrieve domain statistics
1
NUTCH-479
Support for OR queries
1.1
1
NUTCH-751
Upgrade version of HttpClient
1
NUTCH-466
Flexible segment format
1
NUTCH-249
black- white list url filtering
1.1
1
NUTCH-577
Use explicit tika-config.xml file to enable mime magic detection to be turned on and off
1.1
1
NUTCH-445
Domain İndexing / Query Filter
Reports
Recently Created Issues Report
Created vs Resolved Issues Report
Resolution Time Report
Average Age Report
Pie Chart Report
Contribution Report
User Workload Report
Version Workload Report
Time Tracking Report
Single Level Group By Report
Preset Filters
-
All
-
Outstanding
-
Unscheduled
-
Most important
-
Resolved recently
-
Added recently
-
Updated recently
Project Summary
Open
201
26%
In Progress
3
Reopened
1
Resolved
12
2%
Closed
542
71%
Open Issues
By Priority
Critical
1
Major
100
49%
Minor
83
40%
Trivial
21
10%
By Assignee
Andrzej Bialecki
5
2%
Chris A. Mattmann
7
3%
Chris Schneider
1
Dennis Kubes
10
5%
Doug Cutting
2
1%
Doğacan Güney
1
Enis Soztutar
3
1%
Jerome Charron
2
1%
Otis Gospodnetic
4
2%
Sami Siren
3
1%
Unassigned
167
81%