All Projects : Nutch (Key: NUTCH)

Project Lead: Andrzej Bialecki
URL: http://lucene.apache.org/nutch/

Release Notes

 Select:   Open Issues   Road Map   Change Log   Popular Issues   Subversion Commits   Releases   Versions   Components   

Road Map

Progress: 
  6 of 7 issues have been resolved
Next maintenance release for 0.8.x branch
   New Feature NUTCH-714 UNRESOLVED Need a SFTP and SCP Protocol Handler Major Open
   Bug NUTCH-361 FIXED generator create fetchlist randomly Critical Closed
   Bug NUTCH-401 FIXED Hardcoded /tmp directory in SegmentReader Major Closed
   Bug NUTCH-379 FIXED ParseUtil does not pass through the content's URL to the ParserFactory Major Closed
   Bug NUTCH-391 FIXED ParseUtil logs file contents to log file when it cannot find parser Major Closed
   Bug NUTCH-394 FIXED Searching via Tomcat / nutch-0.9-dev.war raises exception Major Closed
   Bug NUTCH-274 FIXED Empty row in/at end of URL-list results in error Minor Closed
Progress:  No issues.
Maitanance release for 0.7 branch. Done to allow people using 0.7 branch get latest bugfixes
  No issues.
Progress: 
  12 of 51 issues have been resolved
Development for 1.1release
   Improvement NUTCH-475 UNRESOLVED Adaptive crawl delay Major Open
   Improvement NUTCH-666 UNRESOLVED Analysis plugins for multiple language and new Language Identifier Tool Major Open
   Bug NUTCH-583 UNRESOLVED FeedParser empty links for items Major Open
   New Feature NUTCH-650 UNRESOLVED Hbase Integration Major Open
   New Feature NUTCH-628 UNRESOLVED Host database to keep track of host-level information Major Open
   New Feature NUTCH-541 UNRESOLVED Index url field untokenized Major Open
   New Feature NUTCH-717 UNRESOLVED Make Nutch Solr integration easier Major Open
   Improvement NUTCH-716 UNRESOLVED Make subcollection index filed multivalued Major Open
   Improvement NUTCH-573 UNRESOLVED Multiple Domains - Query Search Major Open
   Bug NUTCH-729 UNRESOLVED NPE in FieldIndexer when BasicFields url doesn't exist Major Open
   Bug NUTCH-746 UNRESOLVED NutchBeanConstructor does not close NutchBean upon contextDestroyed, causing resource leak in the container. Major Open
   New Feature NUTCH-460 UNRESOLVED RDF parser plugin Major Open
   Improvement NUTCH-677 UNRESOLVED Segment merge filering based on segment content Major Open
   Bug NUTCH-739 UNRESOLVED SolrDeleteDuplications too slow when using hadoop Major Open
   Improvement NUTCH-479 UNRESOLVED Support for OR queries Major Open
   Improvement NUTCH-768 UNRESOLVED Upgrade Nutch 1.0 to use Hadoop 0.20 Major Open
   Improvement NUTCH-469 UNRESOLVED changes to geoPosition plugin to make it work on nutch 0.9 Major Open
   Bug NUTCH-455 UNRESOLVED dedup on tokenized fields is faulty Major Open
   Improvement NUTCH-747 UNRESOLVED inject&Index metadatas and inherit these metadatas to all matching suburls Major Open
   Bug NUTCH-540 UNRESOLVED some problem about the Nutch cache Major Open
   Bug NUTCH-578 UNRESOLVED URL fetched with 403 is generated over and over again Major In Progress
   Improvement NUTCH-251 UNRESOLVED Administration GUI Minor Open
   Improvement NUTCH-609 UNRESOLVED Allow Plugins to be Loaded from Jar File(s) Minor Open
   Improvement NUTCH-738 UNRESOLVED Close SegmentUpdater when FetchedSegments is closed Minor Open
   Improvement NUTCH-740 UNRESOLVED Configuration option to override default language for fetched pages. Minor Open
   Improvement NUTCH-477 UNRESOLVED Extend URLFilters to support different filtering chains Minor Open
   Improvement NUTCH-564 UNRESOLVED External parser supports encoding attribute Minor Open
   Improvement NUTCH-750 UNRESOLVED HtmlParser plugin - page title extraction Minor Open
   Improvement NUTCH-655 UNRESOLVED Injecting Crawl metadata Minor Open
   Wish NUTCH-664 UNRESOLVED Possibility to update already stored documents. Minor Open
   Improvement NUTCH-310 UNRESOLVED Review Log Levels Minor Open
   Wish NUTCH-763 UNRESOLVED Separate configuration files from resources to be included in the job file Minor Open
   New Feature NUTCH-710 UNRESOLVED Support for rel="canonical" attribute Minor Open
   Improvement NUTCH-673 UNRESOLVED Upgrade the Carrot2 plug-in to release 3.0 Minor Open
   Bug NUTCH-706 UNRESOLVED Url regex normalizer Minor Open
   Improvement NUTCH-577 UNRESOLVED Use explicit tika-config.xml file to enable mime magic detection to be turned on and off Minor Open
   New Feature NUTCH-705 UNRESOLVED parse-rtf plugin Minor Open
   Improvement NUTCH-309 UNRESOLVED Uses commons logging Code Guards Minor Reopened
   Improvement NUTCH-249 UNRESOLVED black- white list url filtering Trivial Open
   Bug NUTCH-756 FIXED CrawlDatum.set() does not reset Metadata if it is null Blocker Closed
   Bug NUTCH-721 FIXED Fetcher2 Slow Major Closed
   Bug NUTCH-707 FIXED Generation of multiple segments in multiple runs returns only 1 segment Major Closed
   Improvement NUTCH-702 FIXED Lazy Instanciation of Metadata in CrawlDatum Major Closed
   Bug NUTCH-730 FIXED NPE in LinkRank if no nodes with which to create the WebGraph Major Closed
   Improvement NUTCH-731 FIXED Redirection of robots.txt in RobotRulesParser Major Closed
   Improvement NUTCH-765 FIXED Allow Crawl class to call Either Solr or Lucene Indexer Minor Closed
   Improvement NUTCH-679 FIXED Fetcher2 implementing Tool Minor Closed
   Bug NUTCH-757 FIXED RequestUtils getBooleanParameter() always returns false Minor Closed
   Task NUTCH-758 FIXED Set subversion eol-style to "native" Minor Closed
   Improvement NUTCH-754 FIXED Use GenericOptionsParser instead of FileSystem.parseArgs() Minor Closed
   Bug NUTCH-735 FIXED crawl-tool.xml must be read before nutch-site.xml when invoked using crawl command Minor Closed

Reports

Recently Created Issues Report
Created vs Resolved Issues Report
Resolution Time Report
Average Age Report
Pie Chart Report
Contribution Report
User Workload Report
Version Workload Report
Time Tracking Report
Single Level Group By Report

Preset Filters


Project Summary

Open Open 201
   26%
In Progress In Progress 3
Reopened Reopened 1
Resolved Resolved 12
   2%
Closed Closed 542
   71%

Open Issues

By Priority
Critical Critical 1
Major Major 100
   49%
Minor Minor 83
   40%
Trivial Trivial 21
   10%

By Assignee
Andrzej Bialecki 5
   2%
Chris A. Mattmann 7
   3%
Chris Schneider 1
Dennis Kubes 10
   5%
Doug Cutting 2
   1%
Doğacan Güney 1
Enis Soztutar 3
   1%
Jerome Charron 2
   1%
Otis Gospodnetic 4
   2%
Sami Siren 3
   1%
Unassigned 167
   81%