All Projects : Nutch (Key: NUTCH)

Project Lead: Andrzej Bialecki
URL: http://lucene.apache.org/nutch/

Release Notes

 Select:   Open Issues   Road Map   Change Log   Popular Issues   Subversion Commits   Releases   Versions   Components   

Change Log

Nutch 1.0 release
   Bug NUTCH-698 FIXED CrawlDb is corrupted after a few crawl cycles Blocker Closed
   Bug NUTCH-694 FIXED Distributed Search Server fails Blocker Closed
   Bug NUTCH-688 FIXED Fix missing/wrong headers in source files Blocker Closed
   Bug NUTCH-631 FIXED MoreIndexingFilter fails with NoSuchElementException Blocker Closed
   Bug NUTCH-515 FIXED Next fetch time is set incorrectly Blocker Closed
   Bug NUTCH-722 FIXED Nutch contains jars that we cannot redistribute Blocker Closed
   Task NUTCH-621 FIXED Nutch needs to declare it's crypto usage Blocker Closed
   Bug NUTCH-703 FIXED Upgrade to Hadoop 0.19.1 Blocker Closed
   Bug NUTCH-724 DUPLICATE Drop the JAI libraries Blocker Closed
   Bug NUTCH-678 FIXED Hadoop 0.19 requires an update of jets3t Critical Closed
   Bug NUTCH-641 FIXED IndexSorter incorrectly copies stored fields Critical Closed
   Bug NUTCH-700 FIXED Neko1.9.11 goes into a loop Critical Closed
   Bug NUTCH-508 FIXED ${hadoop.log.dir} and ${hadoop.log.file} are not propagated to the tasktracker Major Closed
   New Feature NUTCH-61 FIXED Adaptive re-fetch interval. Detecting umodified content Major Closed
   Bug NUTCH-652 FIXED AdaptiveFetchSchedule#setFetchSchedule doesn't calculate fetch interval correctly Major Closed
   Bug NUTCH-727 FIXED Add KEYS file to release artifact Major Closed
   New Feature NUTCH-699 FIXED Add an "official" solr schema for solr integration Major Closed
   Improvement NUTCH-603 FIXED Add more default url normalizations Major Closed
   New Feature NUTCH-586 FIXED Add option to run compiled classes w/o job file Major Closed
   Improvement NUTCH-279 FIXED Additions for regex-normalize Major Closed
   Improvement NUTCH-602 FIXED Allow configurable number of handlers for search servers Major Closed
   Improvement NUTCH-565 FIXED Arc File to Nutch Segments Converter Major Closed
   Improvement NUTCH-488 FIXED Avoid parsing uneccessary links and get a more relevant outlink list Major Closed
   Improvement NUTCH-485 FIXED Change HtmlParseFilter 's to return ParseResult object instead of Parse object Major Closed
   Improvement NUTCH-605 FIXED Change deprecated configuration methods for Hadoop Major Closed
   Bug NUTCH-643 FIXED ClassCastException in PdfParser on encrypted PDF with empty password Major Closed
   Bug NUTCH-545 FIXED Configuration and OnlineClusterer get initialized in every request. Major Closed
   Improvement NUTCH-669 FIXED Consolidate code for Fetcher and Fetcher2 Major Closed
   Bug NUTCH-532 FIXED CrawlDbMerger: wrong computation of last fetch time Major Closed
   New Feature NUTCH-684 FIXED Dedup support for Solr Major Closed
   Bug NUTCH-467 FIXED DeleteDuplicate fails if Segment index directory has 0 documents Major Closed
   Bug NUTCH-525 FIXED DeleteDuplicates generates ArrayIndexOutOfBoundsException when trying to rerun dedup on a segment Major Closed
   Improvement NUTCH-668 FIXED Domain URL Filter Major Closed
   Bug NUTCH-613 FIXED Empty Summaries and Cached Pages Major Closed
   Bug NUTCH-497 FIXED Extreme Nested Tags causes StackOverflowException in DomContentUtils...Spider Trap Major Closed
   Bug NUTCH-579 FIXED Feed plugin only indexes one post per feed due to identical digest Major Closed
   Bug NUTCH-413 FIXED Fetcher ignores -noParsing command line option Major Closed
   Bug NUTCH-597 FIXED Fetcher2 - java.lang.NullPointerException when host does not exist and fetcher.threads.per.host.by.ip is set to true causes threads to finish. Major Closed
   Bug NUTCH-474 FIXED Fetcher2 sets server-delay and blocking checks incorrectly Major Closed
   Bug NUTCH-126 FIXED Fetching via https does not work with a proxy (patch) Major Closed
   Bug NUTCH-518 FIXED Fix OpicScoringFilter to respect scoring filter chaining Major Closed
   Bug NUTCH-382 FIXED Fix for NUTCH-365 introduced a bug if generate.max.per.host.by.ip is enabled Major Closed
   Bug NUTCH-471 FIXED Fix synchronization in NutchBean creation Major Closed
   New Feature NUTCH-74 FIXED French Analyzer Plugin Major Closed
   Bug NUTCH-503 FIXED Generator exits incorrectly for small fetchlists Major Closed
   Bug NUTCH-554 FIXED Generator throws java.io.IOException and dies on injected urls with no protocol Major Closed
   Bug NUTCH-636 FIXED Http client plug-in https doesn't work on IBM JRE Major Closed
   Bug NUTCH-561 FIXED HttpClient plugin does not work with NTLM authentication Major Closed
   Improvement NUTCH-501 FIXED Implement a different caching mechanism for objects cached in configuration Major Closed
   Bug NUTCH-574 FIXED Including inlink anchor text in index can create irrelevant search results. Major Closed
   Improvement NUTCH-510 FIXED IndexMerger delete working dir Major Closed
   Bug NUTCH-393 FIXED Indexer doesn't handle null documents returned by filters Major Closed
   New Feature NUTCH-442 FIXED Integrate Solr/Nutch Major Closed
   Bug NUTCH-671 FIXED JSP errors in Nutch searcher webapp running with Tomcat 6 Major Closed
   Bug NUTCH-723 FIXED LICENCE.txt is lacking info that should be there Major Closed
   New Feature NUTCH-635 FIXED LinkAnalysis Tool for Nutch Major Closed
   Bug NUTCH-533 FIXED LinkDbMerger: url normalized is not updated in the key and inlinks list Major Closed
   New Feature NUTCH-261 FIXED Multi Language Support Major Closed
   Bug NUTCH-725 FIXED NOTICE.txt is lacking info that should be there Major Closed
   Bug NUTCH-575 FIXED NPE in OpenSearchServlet when summary is null Major Closed
   Improvement NUTCH-559 FIXED NTLM, Basic and Digest Authentication schemes for web/proxy server Major Closed
   Bug NUTCH-504 FIXED NUTCH-443 broke parsing during fetching Major Closed
   Bug NUTCH-487 FIXED Neko HTML parser goes on default settings. Major Closed
   New Feature NUTCH-646 FIXED New Indexing Framework for Nutch Major Closed
   Bug NUTCH-516 FIXED Next fetch time is not set when it is a CrawlDatum.STATUS_FETCH_GONE Major Closed
   Bug NUTCH-529 FIXED NodeWalker.skipChildren doesn't work for more than 1 child. Major Closed
   Bug NUTCH-593 FIXED Nutch crawl problem Major Closed
   Improvement NUTCH-506 FIXED Nutch should delegate compression to Hadoop Major Closed
   Improvement NUTCH-614 FIXED Order Inlinks by OPIC score of parent page Major Closed
   New Feature NUTCH-392 FIXED OutputFormat implementations should pass on Progressable Major Closed
   Bug NUTCH-220 FIXED PDF Box can't parse document: java.lang.NullPointerException Major Closed
   Bug NUTCH-550 FIXED Parse fails if db.max.outlinks.per.page is -1 Major Closed
   Bug NUTCH-645 FIXED Parse-swf unit test failing Major Closed
   Bug NUTCH-535 FIXED ParseData's contentMeta accumulates unnecessary values during parse Major Closed
   Improvement NUTCH-634 FIXED Patch - Nutch - Hadoop 0.17.1 Major Closed
   Bug NUTCH-726 FIXED README.txt is lacking info that should be there Major Closed
   Bug NUTCH-615 FIXED Redirected URL are fetched wihtout setting any FetchInterval Major Closed
   Improvement NUTCH-547 FIXED Redirection handling: YahooSlurp's algorithm Major Closed
   Task NUTCH-339 FIXED Refactor nutch to allow fetcher improvements Major Closed
   Improvement NUTCH-598 FIXED Remove deprecated use of ToolBase, Migration to the new implementation Major Closed
   Improvement NUTCH-434 FIXED Replace usage of ObjectWritable with something based on GenericWritable Major Closed
   Bug NUTCH-616 FIXED Reset Fetch Retry counter when fetch is successful Major Closed
   New Feature NUTCH-647 FIXED Resolve URLs tool Major Closed
   Bug NUTCH-682 FIXED SOLR indexer does not set boost on the document Major Closed
   Improvement NUTCH-534 FIXED SegmentMerger: add -normalize option Major Closed
   Bug NUTCH-715 FIXED Subcollection plugin doesn't work with default subcollections.xml file Major Closed
   Bug NUTCH-153 FIXED TextParser is only supposed to parse plain text, but if given postscript, it can take hours and then fail Major Closed
   Bug NUTCH-618 FIXED Tika error "Media type alias already exists" Major Closed
   New Feature NUTCH-439 FIXED Top Level Domains Indexing / Scoring Major Closed
   Bug NUTCH-612 FIXED URL filtering is always disabled in Generator when invoked by Crawl Major Closed
   Improvement NUTCH-489 FIXED URLFilter-suffix management of the url path when the url contains some query parameters Major Closed
   Bug NUTCH-642 FIXED Unit tests fail when run in non-local mode Major Closed
   Bug NUTCH-607 FIXED Update build.xml to include tika jar in war file Major Closed
   Improvement NUTCH-691 FIXED Update jakarta poi jars to the most relevant version Major Closed
   Improvement NUTCH-552 FIXED Upgrade Nutch to Hadoop 0.15.x Major Closed
   Improvement NUTCH-604 FIXED Upgrade Nutch to Lucene 2.3.0 Major Closed
   Improvement NUTCH-587 FIXED Upgrade Nutch to use Hadoop 0.15.3 release Major Closed
   Improvement NUTCH-611 FIXED Upgrade Nutch to use Hadoop 0.16 Major Closed
   Improvement NUTCH-663 FIXED Upgrade Nutch to use Hadoop 0.19 Major Closed
   Improvement NUTCH-662 FIXED Upgrade Nutch to use Lucene 2.4 Major Closed
   Improvement NUTCH-608 FIXED Upgrade nutch to use released apache-tika-0.1-incubating Major Closed
   Improvement NUTCH-653 FIXED Upgrade to hadoop 0.18 Major Closed
   Bug NUTCH-517 FIXED build encoding should be UTF-8 Major Closed
   Bug NUTCH-626 FIXED fetcher2 breaks out the domain with db.ignore.external.links set at cross domain redirects Major Closed
   Bug NUTCH-546 FIXED file URL are filtered out by the crawler Major Closed
   Bug NUTCH-481 FIXED http.content.limit is broken in the protocol-httpclient plugin Major Closed
   Bug NUTCH-695 FIXED incorrect mime type detection by MoreIndexingFilter plugin Major Closed
   Bug NUTCH-507 FIXED lib-lucene-analyzers jar defintion is wrong in plugin.xml Major Closed
   New Feature NUTCH-25 FIXED needs 'character encoding' detector Major Closed
   Bug NUTCH-120 FIXED one "bad" link on a page kills parsing Major Closed
   Bug NUTCH-353 FIXED pages that serverside forwards will be refetched every time Major Closed
   Bug NUTCH-681 FIXED parse-mp3 compilation problem Major Closed
   Bug NUTCH-571 FIXED parse-mp3 plugin doesn't always index album of mp3 Major Closed
   Bug NUTCH-560 FIXED protocol-httpclient reading more bytes than http.content.limit Major Closed
   Bug NUTCH-419 FIXED unavailable robots.txt kills fetch Major Closed
   Bug NUTCH-584 FIXED urls missing from fetchlist Major Closed
   Improvement NUTCH-530 WON'T FIX Add a combiner to improve performance on updatedb Major Closed
   Task NUTCH-637 WON'T FIX Add method to nutch and tika system(Code written) Major Closed
   Improvement NUTCH-486 WON'T FIX Break searcher dependency on commons-cli Major Closed
   Bug NUTCH-632 WON'T FIX Bug in TextParser with encoding Major Closed
   Bug NUTCH-748 WON'T FIX DiskChecker Could not find Major Closed
   Improvement NUTCH-590 WON'T FIX Index multiple docs per call using IndexingFilter extension point Major Closed
   New Feature NUTCH-82 WON'T FIX Nutch Commands should run on Windows without external tools Major Closed
   Wish NUTCH-155 WON'T FIX Remove web gui from the distribution to "contrib" and use OpenSearch Servlet Major Closed
   Improvement NUTCH-526 WON'T FIX Use a combiner in LinDbMerger to improve the performance as in LinkDb Major Closed
   Improvement NUTCH-357 WON'T FIX crawling simulation Major Closed
   Improvement NUTCH-661 WON'T FIX errors when the uri contains space characters Major Closed
   Bug NUTCH-599 WON'T FIX nutch crawl and index problem Major Closed
   Bug NUTCH-630 DUPLICATE Error caused by index-more plugin in the latest svn revision - 652259 Major Closed
   Bug NUTCH-592 DUPLICATE Fetcher2 : NPE for page with status ProtocolStatus.TEMP_MOVED Major Closed
   Bug NUTCH-701 DUPLICATE Replace Fetcher with Fetcher2 Major Closed
   Bug NUTCH-491 DUPLICATE dedup fails with ArrayIndexOutOfBoundsException Major Closed
   Bug NUTCH-572 INVALID Scoring and redirected Urls Major Closed
   New Feature NUTCH-452 INCOMPLETE Nutch JSF/My Faces Search Frontend Major Closed
   Sub-task NUTCH-262 INCOMPLETE NUTCH-261
Summary excerpts and highlights problems
Major Closed
   Bug NUTCH-531 CANNOT REPRODUCE Pages with no ContentType cause a Null Pointer exception Major Closed
   Bug NUTCH-398 CANNOT REPRODUCE map-reduce very slow when crawling on single server Major Closed
   Improvement NUTCH-687 FIXED Add RAT Minor Closed
   Improvement NUTCH-500 FIXED Add hadoop masters configuration file into conf folder Minor Closed
   Improvement NUTCH-582 FIXED Add missing type parameters Minor Closed
   Improvement NUTCH-345 FIXED Add support for Content-Encoding: deflated Minor Closed
   Improvement NUTCH-765 FIXED Allow Crawl class to call Either Solr or Lucene Indexer Minor Closed
   Bug NUTCH-620 FIXED BasicURLNormalizer should collapse runs of slashes with a single slash Minor Closed
   Bug NUTCH-502 FIXED Bug in SegmentReader causes infinite loop Minor Closed
   Improvement NUTCH-639 FIXED Change LuceneDocumentWrapper visibility from private to protected Minor Closed
   Bug NUTCH-161 FIXED Change Plain text parser to use parser.character.encoding.default property for fall back encoding Minor Closed
   Improvement NUTCH-528 FIXED CrawlDbReader: add some new stats + dump into a csv format Minor Closed
   Bug NUTCH-494 FIXED FindBugs: CrawlDbReader and DeleteDuplicates Minor Closed
   Bug NUTCH-539 FIXED HttpClient plugin does not work with BasicAuthentication Minor Closed
   New Feature NUTCH-563 FIXED Include custom fields in BasicQueryFilter Minor Closed
   Bug NUTCH-711 FIXED Indexer failing after upgrade to Hadoop 0.19.1 Minor Closed
   Improvement NUTCH-514 FIXED Indexer should only index pages with fetch status SUCCESS Minor Closed
   Improvement NUTCH-667 FIXED Input Format for working with Content in Hadoop Streaming Minor Closed
   Improvement NUTCH-676 FIXED MapWritable is written inefficiently and confusingly Minor Closed
   Improvement NUTCH-548 FIXED Move URLNormalizer from Outlink to ParseOutputFormat Minor Closed
   Bug NUTCH-683 FIXED NUTCH-676 broke CrawlDbMerger Minor Closed
   Improvement NUTCH-505 FIXED Outlink urls should be validated Minor Closed
   Bug NUTCH-411 FIXED Parse ignores meta refresh redirection Minor Closed
   Bug NUTCH-633 FIXED ParseSegment no longer allow reparsing Minor Closed
   Bug NUTCH-596 FIXED ParseSegments parse content even if its not CrawlDatum.STATUS_FETCH_SUCCESS Minor Closed
   Improvement NUTCH-444 FIXED Possibly use a different library to parse RSS feed for improved performance and compatibility Minor Closed
   Improvement NUTCH-567 FIXED Proper (?) handling of URIs in TagSoup. Minor Closed
   Improvement NUTCH-601 FIXED Recrawling on existing crawl directory using force option Minor Closed
   Improvement NUTCH-536 FIXED Reduce number of warnings in nutch core Minor Closed
   Bug NUTCH-606 FIXED Refactoring of Generator, run all urls through checks Minor Closed
   Improvement NUTCH-651 FIXED Remove bin/{start|stop}-balancer.sh from svn tracking Minor Closed
   Improvement NUTCH-580 FIXED Remove deprecated hadoop api calls (FS) Minor Closed
   Bug NUTCH-446 FIXED RobotRulesParser should ignore Crawl-delay values of other bots in robots.txt Minor Closed
   Improvement NUTCH-468 FIXED Scoring filter should distribute score to all outlinks at once Minor Closed
   New Feature NUTCH-665 FIXED Search Load Testing Tool Minor Closed
   Bug NUTCH-495 FIXED Unnecessary delays in Fetcher2 Minor Closed
   Improvement NUTCH-680 FIXED Update external jars to latest versions Minor Closed
   Improvement NUTCH-544 FIXED Upgrade Carrot2 clustering plugin to the newest stable release (2.1) Minor Closed
   Improvement NUTCH-498 FIXED Use Combiner in LinkDb to increase speed of linkdb generation Minor Closed
   Improvement NUTCH-522 FIXED Use URLValidator in the Injector Minor Closed
   New Feature NUTCH-443 FIXED allow parsers to return multiple Parse object, this will speed up the rss parser Minor Closed
   Bug NUTCH-359 FIXED extraction of links will fail for whole page if one single link cannot be parsed Minor Closed
   Improvement NUTCH-456 FIXED parse msexcel plugin speedup Minor Closed
   Bug NUTCH-247 FIXED robot parser to restrict. Minor Closed
   Improvement NUTCH-720 FIXED site: search operator with no query term Minor Closed
   Improvement NUTCH-171 WON'T FIX Bring back multiple segment support for Generate / Update Minor Closed
   Improvement NUTCH-451 WON'T FIX Tool to recover partial fetcher output Minor Closed
   Improvement NUTCH-509 WON'T FIX Update Crawldb: avoid to start a job if there is no valid segment Minor Closed
   Improvement NUTCH-330 WON'T FIX command line tool to search a Lucene index Minor Closed
   Improvement NUTCH-553 DUPLICATE Add more normalization rules to regex-normalize file. Minor Closed
   Improvement NUTCH-448 LATER Allow Plugin Includes and Excludes from File Minor Closed
   Improvement NUTCH-223 FIXED Crawl.java uses Integer.MAX_VALUE for -topN where Generator.java uses Long.MAX_VALUE for -topN Trivial Closed
   Improvement NUTCH-538 FIXED Delete unused classes under o.a.n.util Trivial Closed
   Bug NUTCH-484 FIXED Nutch Nightly API link is broken in site Trivial Closed
   Improvement NUTCH-499 FIXED Refactor LinkDb and LinkDbMerger to reuse code Trivial Closed
   Bug NUTCH-482 FIXED Remove redundant plugin lib-log4j Trivial Closed
   Bug NUTCH-483 FIXED remove redundant commons-logging jar from ontology plugin Trivial Closed
   Improvement NUTCH-513 FIXED suffix-urlfilter.txt does not have a template Trivial Closed
   Bug NUTCH-654 FIXED urlfilter-regex's main does not work Trivial Closed
Nutch 0.9 release
   Bug NUTCH-354 FIXED MapWritable, nextEntry is not reset when Entries are recycled Blocker Closed
   Task NUTCH-400 FIXED Update & add missing license headers Blocker Closed
   Bug NUTCH-273 FIXED When a page is redirected, the original url is NOT updated. Blocker Closed
   Bug NUTCH-332 FIXED doubling score causes by page internal anchors. Blocker Closed
   Bug NUTCH-233 FIXED wrong regular expression hang reduce process for ever Blocker Closed
   Bug NUTCH-336 FIXED Harvested links shouldn't get db.score.injected in addition to inbound contributions Critical Closed
   Bug NUTCH-341 FIXED IndexMerger now deletes entire <workingdir> after completing Critical Closed
   Bug NUTCH-105 FIXED Network error during robots.txt fetch causes file to be ignored Critical Closed
   Improvement NUTCH-167 FIXED Observation of <META NAME="ROBOTS" CONTENT="NOARCHIVE"> directive Critical Closed
   Bug NUTCH-361 FIXED generator create fetchlist randomly Critical Closed
   Bug NUTCH-433 FIXED java.io.EOFException in newer nightlies in mergesegs or indexing from hadoop.io.DataOutputBuffer Critical Closed
   Bug NUTCH-318 FIXED log4j not proper configured, readdb doesnt give any information Critical Closed
   Bug NUTCH-350 FIXED urls blocked db.fetch.retry.max * http.max.delays times during fetching are marked as STATUS_DB_GONE Critical Closed
   Bug NUTCH-381 WON'T FIX Ignore external link not work as expected Critical Closed
   Bug NUTCH-277 CANNOT REPRODUCE Fetcher dies because of "max. redirects" (avoiding infinite loop) Critical Closed
   Bug NUTCH-331 CANNOT REPRODUCE Fetcher incorrectly reports task progress to tasktracker resulting in skipped URLs Critical Closed
   Bug NUTCH-258 CANNOT REPRODUCE Once Nutch logs a SEVERE log item, Nutch fails forevermore Critical Closed
   Bug NUTCH-417 FIXED After upgrade to hadoop-0.9.1, parsing and indexing doesn't work. Major Closed
   Bug NUTCH-340 FIXED Bug(s) in 0.8 tutorial Major Closed
   Bug NUTCH-347 FIXED Build: plugins' Jars not found Major Closed
   Bug NUTCH-405 FIXED Content object is not properly initialized in map method of ParseSegment Major Closed
   Improvement NUTCH-416 FIXED CrawlDatum status and CrawlDbReducer refactoring Major Closed
   Bug NUTCH-371 FIXED DeleteDuplicates should remove documents with duplicate URLs Major Closed
   Bug NUTCH-367 FIXED DistributedSearch thown ClassCastException Major Closed
   Bug NUTCH-322 FIXED Fetcher discards ProtocolStatus, doesn't store redirected pages Major Closed
   Bug NUTCH-337 FIXED Fetcher ignores the fetcher.parse value configured in config file Major Closed
   Bug NUTCH-344 FIXED Fetcher threads blocked on synchronized block in cleanExpiredServerBlocks Major Closed
   Bug NUTCH-404 FIXED Fix LinkDB Usage - implementation mismatch Major Closed
   Bug NUTCH-418 FIXED Fixes parsing of XHTML (e.g. title) Major Closed
   Improvement NUTCH-365 FIXED Flexible URL normalization Major Closed
   Bug NUTCH-415 FIXED Generate should mark selected records in crawlDB Major Closed
   Bug NUTCH-401 FIXED Hardcoded /tmp directory in SegmentReader Major Closed
   Improvement NUTCH-395 FIXED Increase fetching speed Major Closed
   Bug NUTCH-432 FIXED JAVA_PLATFORM with spaces (i.e. Mac OS X-ppc-32) breaks bin/nutch script Major Closed
   Improvement NUTCH-403 FIXED Make URL filtering optional in Generator Major Closed
   Bug NUTCH-437 FIXED MapFile in Hadoop Trunk has changed, must update references Major Closed
   Improvement NUTCH-378 FIXED MetaWrapper decorator Major Closed
   Bug NUTCH-406 FIXED Metadata tries to write null values Major Closed
   New Feature NUTCH-646 FIXED New Indexing Framework for Nutch Major Closed
   New Feature NUTCH-253 FIXED Normalize Host during Generate Major Closed
   Bug NUTCH-428 FIXED NullPointerException Major Closed
   Improvement NUTCH-614 FIXED Order Inlinks by OPIC score of parent page Major Closed
   Bug NUTCH-379 FIXED ParseUtil does not pass through the content's URL to the ParserFactory Major Closed
   Bug NUTCH-391 FIXED ParseUtil logs file contents to log file when it cannot find parser Major Closed
   Bug NUTCH-384 FIXED Protocol-file plugin does not allow the parse plugins framework to operate properly Major Closed
   Bug NUTCH-362 FIXED Remove parse-text from unsupported filetypes in parse-plugins.xml Major Closed
   Bug NUTCH-394 FIXED Searching via Tomcat / nutch-0.9-dev.war raises exception Major Closed
   Task NUTCH-360 FIXED Switch nutch to use java 5 source format Major Closed
   Bug NUTCH-305 FIXED Update crawl and url filter lists to exclude jpeg|JPEG|bmp|BMP Major Closed
   Improvement NUTCH-459 FIXED Upgrade Nutch to Hadoop 0.12.1 Major Closed
   Improvement NUTCH-383 FIXED Upgrade Nutch to Hadoop 0.7 Major Closed
   Bug NUTCH-205 FIXED Wrong 'fetch date' for non available pages Major Closed
   Bug NUTCH-266 FIXED hadoop bug when doing updatedb Major Closed
   Bug NUTCH-387 FIXED host normalization in Generator$Selector Major Closed
   Bug NUTCH-430 FIXED integer overflow in HashComparator.compare Major Closed
   Bug NUTCH-425 FIXED parse-js pollutes anchor text with base URL of source page Major Closed
   Bug NUTCH-374 FIXED when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip or x-gzip , it can not fetch any thing. Major Closed
   Bug NUTCH-675 WON'T FIX Reduce tasks do not report their status and are killed by jobtracker Major Closed
   Bug NUTCH-543 DUPLICATE CLONE -some problem about the Nutch cache Major Closed
   Improvement NUTCH-581 FIXED DistributedSearch does not update search servers added to search-servers.txt on the fly Minor Resolved
   New Feature NUTCH-68 FIXED A tool to generate arbitrary fetchlists Minor Closed
   Improvement NUTCH-421 FIXED Allow predeterminate running order of index filters Minor Closed
   Task NUTCH-399 FIXED Change CommandRunner to use concurrent api from jdk Minor Closed
   Improvement NUTCH-440 FIXED Command line utilities should exit with an error message when given wrong arguments Minor Closed
   Improvement NUTCH-226 FIXED CrawlDb Filter tool Minor Closed
   Bug NUTCH-420 FIXED DeleteDuplicates.HashPartitioner depends on the order of IndexDocs Minor Closed
   Bug NUTCH-274 FIXED Empty row in/at end of URL-list results in error Minor Closed
   Bug NUTCH-325 FIXED UrlFilters.java throws NPE in case urlfilter.order contains Filters that are not in plugin.includes Minor Closed
   Bug NUTCH-388 FIXED nutch-default.xml has outdated example for urlfilter.order Minor Closed
   Bug NUTCH-426 FIXED parse-js skips parsing if found URL fails java.net.URL parse Minor Closed
   Bug NUTCH-246 FIXED segment size is never as big as topN or crawlDB size in a distributed deployement Minor Closed
   Bug NUTCH-524 WON'T FIX Generate Problem with Single Node Minor Closed
   Bug NUTCH-390 FIXED Javadoc warnings Trivial Closed
   Improvement NUTCH-338 FIXED Remove the text parser as an option for parsing PDF files in parse-plugins.xml Trivial Closed
Maintenance release for 0.8 branch
   Bug NUTCH-354 FIXED MapWritable, nextEntry is not reset when Entries are recycled Blocker Closed
   Bug NUTCH-332 FIXED doubling score causes by page internal anchors. Blocker Closed
   Bug NUTCH-336 FIXED Harvested links shouldn't get db.score.injected in addition to inbound contributions Critical Closed
   Bug NUTCH-341 FIXED IndexMerger now deletes entire <workingdir> after completing Critical Closed
   Bug NUTCH-105 FIXED Network error during robots.txt fetch causes file to be ignored Critical Closed
   Bug NUTCH-318 FIXED log4j not proper configured, readdb doesnt give any information Critical Closed
   Bug NUTCH-350 FIXED urls blocked db.fetch.retry.max * http.max.delays times during fetching are marked as STATUS_DB_GONE Critical Closed
   Bug NUTCH-337 FIXED Fetcher ignores the fetcher.parse value configured in config file Major Closed
   Bug NUTCH-344 FIXED Fetcher threads blocked on synchronized block in cleanExpiredServerBlocks Major Closed
   Bug NUTCH-462 FIXED Noarchive urls are available via the cache link Major Closed
   Bug NUTCH-205 FIXED Wrong 'fetch date' for non available pages Major Closed
   Bug NUTCH-266 FIXED hadoop bug when doing updatedb Major Closed
   Improvement NUTCH-338 FIXED Remove the text parser as an option for parsing PDF files in parse-plugins.xml Trivial Closed

Reports

Recently Created Issues Report
Created vs Resolved Issues Report
Resolution Time Report
Average Age Report
Pie Chart Report
Contribution Report
User Workload Report
Version Workload Report
Time Tracking Report
Single Level Group By Report

Preset Filters


Project Summary

Open Open 202
   27%
In Progress In Progress 3
Reopened Reopened 1
Resolved Resolved 12
   2%
Closed Closed 542
   71%

Open Issues

By Priority
Critical Critical 1
Major Major 101
   49%
Minor Minor 83
   40%
Trivial Trivial 21
   10%

By Assignee
Andrzej Bialecki 5
   2%
Chris A. Mattmann 7
   3%
Chris Schneider 1
Dennis Kubes 11
   5%
Doug Cutting 2
   1%
Doğacan Güney 1
Enis Soztutar 3
   1%
Jerome Charron 2
   1%
Otis Gospodnetic 4
   2%
Sami Siren 3
   1%
Unassigned 167
   81%