Issue Details (XML | Word | Printable)

Key: NUTCH-193
Type: Task Task
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Doug Cutting
Reporter: Doug Cutting
Votes: 1
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Nutch

move NDFS and MapReduce to a separate project

Created: 01/Feb/06 02:52 AM   Updated: 24/Oct/06 04:14 PM
Return to search
Component/s: ndfs
Affects Version/s: 0.8
Fix Version/s: 0.8

Time Tracking:
Not Specified

Issue Links:
Incorporates
 

Resolution Date: 04/Feb/06 09:49 AM


 Description  « Hide
The NDFS and MapReduce code should move from Nutch to a new Lucene sub-project named Hadoop.

My plan is to do this as follows:

1. Move all code in the following packages from Nutch to Hadoop:

org.apache.nutch.fs
org.apache.nutch.io
org.apache.nutch.ipc
org.apache.nutch.mapred
org.apache.nutch.ndfs

These packages will all be renamed to org.apache.hadoop, and Nutch code will be updated to reflect this.

2. Move selected classes from Nutch to Hadoop, as follows:

org.apache.nutch.util.NutchConf -> org.apache.hadoop.conf.Configuration
org.apache.nutch.util.NutchConfigurable -> org.apache.hadoop.Configurable
org.apache.nutch.util.NutchConfigured -> org.apache.hadoop.Configured

org.apache.nutch.util.Progress -> org.apache.hadoop.util.Progress
org.apache.nutch.util.LogFormatter-> org.apache.hadoop.util.LogFormatter
org.apache.nutch.util.Daemon -> org.apache.hadoop.util.Daemon

3. Add a jar containing all of the above the Nutch's lib directory.

Does this plan sound reasonable?



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Doug Cutting added a comment - 01/Feb/06 02:55 AM
Link to the related Nutch issue.

Doug Cutting made changes - 01/Feb/06 02:55 AM
Field Original Value New Value
Link This issue incorporates HADOOP-1 [ HADOOP-1 ]
Doug Cutting added a comment - 01/Feb/06 03:06 AM
NDFS, the Nutch Distributed Filesystem will be renamed HDFS, the Hadoop Distributed Filesystem. Its code will live in the package org.apache.nutch.dfs, and its fs implementation class will be named DistributedFileSystem.

Andrzej Bialecki added a comment - 01/Feb/06 03:19 AM
What timeframe did you have in mind? There are a few patches in the queue, which will be affected by this split.

Other than that - emphatic yes!


Sami Siren added a comment - 01/Feb/06 03:36 AM
+1

I quess the fuse-j - ndfs work from John/me could be part of hadoop /contrib after this change?


Doug Cutting added a comment - 01/Feb/06 03:48 AM
Andrzej: I'd like to do this soon, this week or next. No matter how long I wait, there will probably always be a few patches queued that will need to be updated. But hopefully we can avoid large patches like NUTCH-169. What other patches are you concerned about in particular?

Sami: yes, the fuse stuff would then make a great hadoop contrib package.


Otis Gospodnetic added a comment - 01/Feb/06 03:58 AM
I assume Doug meant org.apache.hadoop.dfs, not org.apache.nutch.dfs.

Andrzej Bialecki added a comment - 01/Feb/06 04:01 AM
Ok, the sooner the better from my POV. I didn;t have anything in mind that would be included in Hadoop, rather Nutch patches that I'm working on. Affected patches include some of the recent larger ones: the adaptive fetch schedule thing and crawl metadata. No big deal, but we need to know what to shoot for.

Doug Cutting added a comment - 01/Feb/06 04:39 AM
Otis: yes, thanks, I meant org.apache.hadoop.dfs.

Andrzej: I'm awaiting Mike's commit of NUTCH-183, which should happen today. I'll then try to make the split tomorrow.


John Xing added a comment - 03/Feb/06 04:28 PM
what's in the name hadoop? Because "had oops"?

Repository Revision Date User Message
ASF #374796 Sat Feb 04 00:38:32 UTC 2006 cutting NUTCH-193: MapReduce and NDFS code moved to new project, Hadoop. See bug report for details.
Files Changed
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseImpl.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/net/UrlNormalizerFactory.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-mp3/src/java/org/apache/nutch/parse/mp3/MP3Parser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/tools/DmozParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/plugin/Plugin.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParserFactory.java
MODIFY /lucene/nutch/trunk/src/plugin/query-more/src/java/org/apache/nutch/searcher/more/DateQueryFilter.java
DEL /lucene/nutch/trunk/bin/slaves.sh
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/Outlink.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/protocol/ProtocolFactory.java
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/parse/TestOutlinkExtractor.java
MODIFY /lucene/nutch/trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/Http.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/net/BasicUrlNormalizer.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseText.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/protocol/ContentProperties.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/tools/PruneIndexTool.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/protocol/ProtocolStatus.java
MODIFY /lucene/nutch/trunk/src/plugin/query-url/src/java/org/apache/nutch/searcher/url/URLQueryFilter.java
MODIFY /lucene/nutch/trunk/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang/LanguageQueryFilter.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexMerger.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/NdfsDirectory.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Inlink.java
DEL /lucene/nutch/trunk/src/webapps
DEL /lucene/nutch/trunk/bin/nutch-daemon.sh
ADD /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/FsDirectory.java (from /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/NdfsDirectory.java)
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/MD5Signature.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/Indexer.java
MODIFY /lucene/nutch/trunk/conf/nutch-site.xml.template
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/net/URLFilterChecker.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexSorter.java
MODIFY /lucene/nutch/trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpBasicAuthentication.java
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/parse/TestParserFactory.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/analysis/CommonGrams.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Inlinks.java
MODIFY /lucene/nutch/trunk/src/plugin/protocol-file/src/java/org/apache/nutch/protocol/file/FileResponse.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/io
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseSegment.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip/ZipParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/Parser.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext/ExtParser.java
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/net/TestBasicUrlNormalizer.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/SignatureFactory.java
MODIFY /lucene/nutch/trunk/src/plugin/creativecommons/src/java/org/creativecommons/nutch/CCParseFilter.java
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/parse/TestParseText.java
MODIFY /lucene/nutch/trunk/src/plugin/query-basic/src/java/org/apache/nutch/searcher/basic/BasicQueryFilter.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-mspowerpoint/src/test/org/apache/nutch/parse/mspowerpoint/TestMSPowerPointParser.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-msword/src/test/org/apache/nutch/parse/msword/TestMSWordParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/FieldQueryFilter.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-msword/src/java/org/apache/nutch/parse/msword/MSWordParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/clustering/OnlineClustererFactory.java
MODIFY /lucene/nutch/trunk/src/plugin/languageidentifier/src/test/org/apache/nutch/analysis/lang/TestHTMLLanguageParser.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss/RSSParser.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/mapred
MODIFY /lucene/nutch/trunk/src/plugin/build.xml
MODIFY /lucene/nutch/trunk/src/plugin/parse-text/src/java/org/apache/nutch/parse/text/TextParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParserChecker.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/PartitionUrlByHost.java
MODIFY /lucene/nutch/trunk/src/plugin/query-site/src/java/org/apache/nutch/searcher/site/SiteQueryFilter.java
ADD /lucene/nutch/trunk/lib/hadoop-0.1-dev.jar
DEL /lucene/nutch/trunk/src/test/org/apache/nutch/fs
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/protocol/Content.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/LinkDbInlinks.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-mspowerpoint/src/java/org/apache/nutch/parse/mspowerpoint/ContentReaderListener.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/LinkDb.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/ndfs
DEL /lucene/nutch/trunk/bin/start-all.sh
MODIFY /lucene/nutch/trunk/src/plugin/parse-swf/src/test/org/apache/nutch/parse/swf/TestSWFParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/OpenSearchServlet.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/Hit.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/net/UrlNormalizer.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Crawl.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/analysis/AnalyzerFactory.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDatum.java
MODIFY /lucene/nutch/trunk/src/plugin/protocol-ftp/src/java/org/apache/nutch/protocol/ftp/Ftp.java
MODIFY /lucene/nutch/trunk/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang/NGramProfile.java
MODIFY /lucene/nutch/trunk/src/plugin/urlfilter-prefix/src/java/org/apache/nutch/net/PrefixURLFilter.java
MODIFY /lucene/nutch/trunk/conf/nutch-default.xml
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/analysis/NutchAnalysisTokenManager.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-pdf/src/java/org/apache/nutch/parse/pdf/PdfParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/IndexSearcher.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/servlet/Cached.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseUtil.java
REPLACE /lucene/nutch/trunk/conf/mapred-default.xml.template
MODIFY /lucene/nutch/trunk/src/plugin/creativecommons/src/java/org/creativecommons/nutch/CCDeleteUnlicensedTool.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/util/Daemon.java
ADD /lucene/nutch/trunk/src/test/org/apache/nutch/util/WritableTestUtils.java
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/searcher/TestHitDetails.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Injector.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/HitDetails.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/analysis/NutchAnalysis.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseStatus.java
DEL /lucene/nutch/trunk/bin/stop-all.sh
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/OutlinkExtractor.java
MODIFY /lucene/nutch/trunk/src/plugin/creativecommons/src/test/org/creativecommons/nutch/TestCCParseFilter.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/DistributedSearch.java
MODIFY /lucene/nutch/trunk/src/plugin/creativecommons/src/java/org/creativecommons/nutch/CCIndexingFilter.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDb.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/HtmlParseFilters.java
MODIFY /lucene/nutch/trunk/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/Http.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/plugin/PluginManifestParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/Query.java
MODIFY /lucene/nutch/trunk/src/plugin/languageidentifier/src/test/org/apache/nutch/analysis/lang/TestLanguageIdentifier.java
MODIFY /lucene/nutch/trunk/conf/crawl-tool.xml
DEL /lucene/nutch/trunk/lib/jetty-5.1.4.jar
MODIFY /lucene/nutch/trunk/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java
MODIFY /lucene/nutch/trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpAuthenticationFactory.java
MODIFY /lucene/nutch/trunk/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang/LanguageIndexingFilter.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-mspowerpoint/src/java/org/apache/nutch/parse/mspowerpoint/PropertiesReaderListener.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/NutchBean.java
DEL /lucene/nutch/trunk/lib/jetty-5.1.4.LICENSE.txt
MODIFY /lucene/nutch/trunk/src/plugin/urlfilter-regex/src/java/org/apache/nutch/net/RegexURLFilter.java
DEL /lucene/nutch/trunk/src/test/org/apache/nutch/ipc
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/LinkDbReader.java
ADD /lucene/nutch/trunk/src/java/org/apache/nutch/util/NutchConfiguration.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/plugin/Extension.java
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/protocol/TestContent.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/analysis/NutchDocumentAnalyzer.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/FetchedSegments.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/net/URLFilters.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/Summarizer.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-rtf/src/java/org/apache/nutch/parse/rtf/RTFParseFactory.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexingFilter.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-mp3/src/java/org/apache/nutch/parse/mp3/MetadataCollector.java
MODIFY /lucene/nutch/trunk/src/plugin/query-more/src/java/org/apache/nutch/searcher/more/TypeQueryFilter.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-js/src/java/org/apache/nutch/parse/js/JSParseFilter.java
ADD /lucene/nutch/trunk/conf/hadoop-default.xml
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/plugin/TestPluginSystem.java
MODIFY /lucene/nutch/trunk/src/plugin/protocol-ftp/src/java/org/apache/nutch/protocol/ftp/FtpResponse.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexingFilters.java
MODIFY /lucene/nutch/trunk/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang/HTMLLanguageParser.java
DEL /lucene/nutch/trunk/src/test/org/apache/nutch/io
MODIFY /lucene/nutch/trunk/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/DOMContentUtils.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/QueryFilter.java
MODIFY /lucene/nutch/trunk/src/test/nutch-site.xml
MODIFY /lucene/nutch/trunk/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/RobotRulesParser.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-swf/src/java/org/apache/nutch/parse/swf/SWFParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java
DEL /lucene/nutch/trunk/src/test/org/apache/nutch/mapred
DEL /lucene/nutch/trunk/lib/jetty-ext
MODIFY /lucene/nutch/trunk/bin/nutch
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/plugin/SimpleTestPlugin.java
MODIFY /lucene/nutch/trunk/src/plugin/ontology/src/test/org/apache/nutch/ontology/TestOntology.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseData.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-mspowerpoint/src/java/org/apache/nutch/parse/mspowerpoint/MSPowerPointParser.java
DEL /lucene/nutch/trunk/src/test/org/apache/nutch/ndfs
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDbReducer.java
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/searcher/TestQuery.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/segment/SegmentReader.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/Hits.java
MODIFY /lucene/nutch/trunk/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/util/NutchConf.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/net/URLFilter.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/util/LogFormatter.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/TextProfileSignature.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/DeleteDuplicates.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseOutputFormat.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/QueryFilters.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/util/NutchConfigurable.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/LuceneQueryOptimizer.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext/TestExtParser.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-zip/src/test/org/apache/nutch/parse/zip/TestZipParser.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/fs
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/protocol/TestContentProperties.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/FetcherOutput.java
MODIFY /lucene/nutch/trunk/src/plugin/ontology/src/java/org/apache/nutch/ontology/OwlParser.java
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/parse/TestParseData.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-mspowerpoint/src/java/org/apache/nutch/parse/mspowerpoint/PPTExtractor.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/plugin/PluginDescriptor.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/util/Progress.java
MODIFY /lucene/nutch/trunk/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang/LanguageIdentifier.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-rss/src/test/org/apache/nutch/parse/rss/TestRSSParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParsePluginsReader.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/plugin/PluginRepository.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-html/src/test/org/apache/nutch/parse/html/TestDOMContentUtils.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/FetcherOutputFormat.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/util/ThreadPool.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/util/NutchConfigured.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Signature.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-rtf/src/test/org/apache/nutch/parse/rtf/TestRTFParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/protocol/Protocol.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/net/RegexUrlNormalizer.java
DEL /lucene/nutch/trunk/bin/nutch-daemons.sh
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/ontology/OntologyFactory.java
MODIFY /lucene/nutch/trunk/src/plugin/protocol-file/src/java/org/apache/nutch/protocol/file/File.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/HtmlParseFilter.java
MODIFY /lucene/nutch/trunk/src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/ipc
MODIFY /lucene/nutch/trunk/src/plugin/ontology/src/java/org/apache/nutch/ontology/OntologyImpl.java
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/analysis/TestQueryParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDbReader.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip/ZipTextExtractor.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-pdf/src/test/org/apache/nutch/parse/pdf/TestPdfParser.java
MODIFY /lucene/nutch/trunk/src/plugin/creativecommons/src/java/org/creativecommons/nutch/CCQueryFilter.java

Doug Cutting added a comment - 04/Feb/06 01:53 AM
The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria. Kids are good at generating such. Googol is a kid's term.

Doug Cutting added a comment - 04/Feb/06 06:18 AM
Okay, I've moved the code from Nutch to Hadoop. Now I need to repair Nutch so that it still works!

One remaining problem is the need to separate nutch config files from hadoop config files. There's now a hadoop-default.xml and hadoop-site.xml, which are separate from the similarly-named nutch files. For now, I'll fix this by adding the following methods to Hadoop's Configuration class:

void addDefaultResource(String name);
void addFinalResource(String name);

Then add a Nutch utility class like:

public class NutchConfiguration {
public static Configuration create() { Configuration conf = new Configuration(); addNutchResources(conf); }
public static Configuration addNutchResources(Configuration conf) { addDefaultResource("nutch-default.xml"); addFinalResource("nutch-site.xml"); }
}

Then all of the places which currently call 'new NutchConf()' can be replaced with 'NutchConfiguration().create()'.

Longer-term we might consider a more radical re-design of the configuration API. But first we need to get Hadoop and Nutch split.


Doug Cutting added a comment - 04/Feb/06 09:49 AM
I just committed this. Phew!

Doug Cutting made changes - 04/Feb/06 09:49 AM
Status Open [ 1 ] Resolved [ 5 ]
Resolution Fixed [ 1 ]
Mike Cafarella added a comment - 08/Feb/06 03:23 AM

It should be noted that the name "Nutch" also comes from one of Doug's children.
They seem to have a proud future in advertising and product naming.


Sami Siren added a comment - 24/Oct/06 04:14 PM
closing issues for released versions

Sami Siren made changes - 24/Oct/06 04:14 PM
Status Resolved [ 5 ] Closed [ 6 ]