History
Log In
h
ome
b
rowse project
f
ind issues
Q
uick Search:
Learn more about
Quick Search
All Projects
: Nutch
(Key: NUTCH)
Project Lead:
Andrzej Bialecki
URL:
http://lucene.apache.org/nutch/
Release Notes
Select:
Open Issues
Road Map
Change Log
Popular Issues
Subversion Commits
Releases
Versions
Components
Road Map
View personal road map
Scope:
next 3 versions |
all versions
0.8.2
(
Release Notes
)
Progress:
6 of 7 issues have been resolved
Next maintenance release for 0.8.x branch
NUTCH-714
UNRESOLVED
Need a SFTP and SCP Protocol Handler
NUTCH-361
FIXED
generator create fetchlist randomly
NUTCH-401
FIXED
Hardcoded /tmp directory in SegmentReader
NUTCH-379
FIXED
ParseUtil does not pass through the content's URL to the ParserFactory
NUTCH-391
FIXED
ParseUtil logs file contents to log file when it cannot find parser
NUTCH-394
FIXED
Searching via Tomcat / nutch-0.9-dev.war raises exception
NUTCH-274
FIXED
Empty row in/at end of URL-list results in error
0.7.3
(
Release Notes
)
Progress:
No issues.
Maitanance release for 0.7 branch. Done to allow people using 0.7 branch get latest bugfixes
No issues.
1.1
(
Release Notes
)
Progress:
12 of 51 issues have been resolved
Development for 1.1release
NUTCH-475
UNRESOLVED
Adaptive crawl delay
NUTCH-666
UNRESOLVED
Analysis plugins for multiple language and new Language Identifier Tool
NUTCH-583
UNRESOLVED
FeedParser empty links for items
NUTCH-650
UNRESOLVED
Hbase Integration
NUTCH-628
UNRESOLVED
Host database to keep track of host-level information
NUTCH-541
UNRESOLVED
Index url field untokenized
NUTCH-717
UNRESOLVED
Make Nutch Solr integration easier
NUTCH-716
UNRESOLVED
Make subcollection index filed multivalued
NUTCH-573
UNRESOLVED
Multiple Domains - Query Search
NUTCH-729
UNRESOLVED
NPE in FieldIndexer when BasicFields url doesn't exist
NUTCH-746
UNRESOLVED
NutchBeanConstructor does not close NutchBean upon contextDestroyed, causing resource leak in the container.
NUTCH-460
UNRESOLVED
RDF parser plugin
NUTCH-677
UNRESOLVED
Segment merge filering based on segment content
NUTCH-739
UNRESOLVED
SolrDeleteDuplications too slow when using hadoop
NUTCH-479
UNRESOLVED
Support for OR queries
NUTCH-768
UNRESOLVED
Upgrade Nutch 1.0 to use Hadoop 0.20
NUTCH-469
UNRESOLVED
changes to geoPosition plugin to make it work on nutch 0.9
NUTCH-455
UNRESOLVED
dedup on tokenized fields is faulty
NUTCH-747
UNRESOLVED
inject&Index metadatas and inherit these metadatas to all matching suburls
NUTCH-540
UNRESOLVED
some problem about the Nutch cache
NUTCH-578
UNRESOLVED
URL fetched with 403 is generated over and over again
NUTCH-251
UNRESOLVED
Administration GUI
NUTCH-609
UNRESOLVED
Allow Plugins to be Loaded from Jar File(s)
NUTCH-738
UNRESOLVED
Close SegmentUpdater when FetchedSegments is closed
NUTCH-740
UNRESOLVED
Configuration option to override default language for fetched pages.
NUTCH-477
UNRESOLVED
Extend URLFilters to support different filtering chains
NUTCH-564
UNRESOLVED
External parser supports encoding attribute
NUTCH-750
UNRESOLVED
HtmlParser plugin - page title extraction
NUTCH-655
UNRESOLVED
Injecting Crawl metadata
NUTCH-664
UNRESOLVED
Possibility to update already stored documents.
NUTCH-310
UNRESOLVED
Review Log Levels
NUTCH-763
UNRESOLVED
Separate configuration files from resources to be included in the job file
NUTCH-710
UNRESOLVED
Support for rel="canonical" attribute
NUTCH-673
UNRESOLVED
Upgrade the Carrot2 plug-in to release 3.0
NUTCH-706
UNRESOLVED
Url regex normalizer
NUTCH-577
UNRESOLVED
Use explicit tika-config.xml file to enable mime magic detection to be turned on and off
NUTCH-705
UNRESOLVED
parse-rtf plugin
NUTCH-309
UNRESOLVED
Uses commons logging Code Guards
NUTCH-249
UNRESOLVED
black- white list url filtering
NUTCH-756
FIXED
CrawlDatum.set() does not reset Metadata if it is null
NUTCH-721
FIXED
Fetcher2 Slow
NUTCH-707
FIXED
Generation of multiple segments in multiple runs returns only 1 segment
NUTCH-702
FIXED
Lazy Instanciation of Metadata in CrawlDatum
NUTCH-730
FIXED
NPE in LinkRank if no nodes with which to create the WebGraph
NUTCH-731
FIXED
Redirection of robots.txt in RobotRulesParser
NUTCH-765
FIXED
Allow Crawl class to call Either Solr or Lucene Indexer
NUTCH-679
FIXED
Fetcher2 implementing Tool
NUTCH-757
FIXED
RequestUtils getBooleanParameter() always returns false
NUTCH-758
FIXED
Set subversion eol-style to "native"
NUTCH-754
FIXED
Use GenericOptionsParser instead of FileSystem.parseArgs()
NUTCH-735
FIXED
crawl-tool.xml must be read before nutch-site.xml when invoked using crawl command
Reports
Recently Created Issues Report
Created vs Resolved Issues Report
Resolution Time Report
Average Age Report
Pie Chart Report
Contribution Report
User Workload Report
Version Workload Report
Time Tracking Report
Single Level Group By Report
Preset Filters
-
All
-
Outstanding
-
Unscheduled
-
Most important
-
Resolved recently
-
Added recently
-
Updated recently
Project Summary
Open
201
26%
In Progress
3
Reopened
1
Resolved
12
2%
Closed
542
71%
Open Issues
By Priority
Critical
1
Major
100
49%
Minor
83
40%
Trivial
21
10%
By Assignee
Andrzej Bialecki
5
2%
Chris A. Mattmann
7
3%
Chris Schneider
1
Dennis Kubes
10
5%
Doug Cutting
2
1%
Doğacan Güney
1
Enis Soztutar
3
1%
Jerome Charron
2
1%
Otis Gospodnetic
4
2%
Sami Siren
3
1%
Unassigned
167
81%