Issue Details (XML | Word | Printable)

Key: NUTCH-193
Type: Task Task
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Doug Cutting
Reporter: Doug Cutting
Votes: 1
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Nutch

move NDFS and MapReduce to a separate project

Created: 01/Feb/06 02:52 AM   Updated: 24/Oct/06 04:14 PM
Return to search
Component/s: ndfs
Affects Version/s: 0.8
Fix Version/s: 0.8

Time Tracking:
Not Specified

Issue Links:
Incorporates
 

Resolution Date: 04/Feb/06 09:49 AM


 Description  « Hide
The NDFS and MapReduce code should move from Nutch to a new Lucene sub-project named Hadoop.

My plan is to do this as follows:

1. Move all code in the following packages from Nutch to Hadoop:

org.apache.nutch.fs
org.apache.nutch.io
org.apache.nutch.ipc
org.apache.nutch.mapred
org.apache.nutch.ndfs

These packages will all be renamed to org.apache.hadoop, and Nutch code will be updated to reflect this.

2. Move selected classes from Nutch to Hadoop, as follows:

org.apache.nutch.util.NutchConf -> org.apache.hadoop.conf.Configuration
org.apache.nutch.util.NutchConfigurable -> org.apache.hadoop.Configurable
org.apache.nutch.util.NutchConfigured -> org.apache.hadoop.Configured

org.apache.nutch.util.Progress -> org.apache.hadoop.util.Progress
org.apache.nutch.util.LogFormatter-> org.apache.hadoop.util.LogFormatter
org.apache.nutch.util.Daemon -> org.apache.hadoop.util.Daemon

3. Add a jar containing all of the above the Nutch's lib directory.

Does this plan sound reasonable?



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Repository Revision Date User Message
ASF #374796 Sat Feb 04 00:38:32 UTC 2006 cutting NUTCH-193: MapReduce and NDFS code moved to new project, Hadoop. See bug report for details.
Files Changed
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseImpl.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/net/UrlNormalizerFactory.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-mp3/src/java/org/apache/nutch/parse/mp3/MP3Parser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/tools/DmozParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/plugin/Plugin.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParserFactory.java
MODIFY /lucene/nutch/trunk/src/plugin/query-more/src/java/org/apache/nutch/searcher/more/DateQueryFilter.java
DEL /lucene/nutch/trunk/bin/slaves.sh
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/Outlink.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/protocol/ProtocolFactory.java
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/parse/TestOutlinkExtractor.java
MODIFY /lucene/nutch/trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/Http.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/net/BasicUrlNormalizer.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseText.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/protocol/ContentProperties.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/tools/PruneIndexTool.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/protocol/ProtocolStatus.java
MODIFY /lucene/nutch/trunk/src/plugin/query-url/src/java/org/apache/nutch/searcher/url/URLQueryFilter.java
MODIFY /lucene/nutch/trunk/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang/LanguageQueryFilter.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexMerger.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/NdfsDirectory.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Inlink.java
DEL /lucene/nutch/trunk/src/webapps
DEL /lucene/nutch/trunk/bin/nutch-daemon.sh
ADD /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/FsDirectory.java (from /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/NdfsDirectory.java)
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/MD5Signature.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/Indexer.java
MODIFY /lucene/nutch/trunk/conf/nutch-site.xml.template
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/net/URLFilterChecker.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexSorter.java
MODIFY /lucene/nutch/trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpBasicAuthentication.java
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/parse/TestParserFactory.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/analysis/CommonGrams.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Inlinks.java
MODIFY /lucene/nutch/trunk/src/plugin/protocol-file/src/java/org/apache/nutch/protocol/file/FileResponse.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/io
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseSegment.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip/ZipParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/Parser.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext/ExtParser.java
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/net/TestBasicUrlNormalizer.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/SignatureFactory.java
MODIFY /lucene/nutch/trunk/src/plugin/creativecommons/src/java/org/creativecommons/nutch/CCParseFilter.java
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/parse/TestParseText.java
MODIFY /lucene/nutch/trunk/src/plugin/query-basic/src/java/org/apache/nutch/searcher/basic/BasicQueryFilter.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-mspowerpoint/src/test/org/apache/nutch/parse/mspowerpoint/TestMSPowerPointParser.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-msword/src/test/org/apache/nutch/parse/msword/TestMSWordParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/FieldQueryFilter.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-msword/src/java/org/apache/nutch/parse/msword/MSWordParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/clustering/OnlineClustererFactory.java
MODIFY /lucene/nutch/trunk/src/plugin/languageidentifier/src/test/org/apache/nutch/analysis/lang/TestHTMLLanguageParser.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss/RSSParser.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/mapred
MODIFY /lucene/nutch/trunk/src/plugin/build.xml
MODIFY /lucene/nutch/trunk/src/plugin/parse-text/src/java/org/apache/nutch/parse/text/TextParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParserChecker.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/PartitionUrlByHost.java
MODIFY /lucene/nutch/trunk/src/plugin/query-site/src/java/org/apache/nutch/searcher/site/SiteQueryFilter.java
ADD /lucene/nutch/trunk/lib/hadoop-0.1-dev.jar
DEL /lucene/nutch/trunk/src/test/org/apache/nutch/fs
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/protocol/Content.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/LinkDbInlinks.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-mspowerpoint/src/java/org/apache/nutch/parse/mspowerpoint/ContentReaderListener.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/LinkDb.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/ndfs
DEL /lucene/nutch/trunk/bin/start-all.sh
MODIFY /lucene/nutch/trunk/src/plugin/parse-swf/src/test/org/apache/nutch/parse/swf/TestSWFParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/OpenSearchServlet.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/Hit.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/net/UrlNormalizer.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Crawl.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/analysis/AnalyzerFactory.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDatum.java
MODIFY /lucene/nutch/trunk/src/plugin/protocol-ftp/src/java/org/apache/nutch/protocol/ftp/Ftp.java
MODIFY /lucene/nutch/trunk/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang/NGramProfile.java
MODIFY /lucene/nutch/trunk/src/plugin/urlfilter-prefix/src/java/org/apache/nutch/net/PrefixURLFilter.java
MODIFY /lucene/nutch/trunk/conf/nutch-default.xml
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/analysis/NutchAnalysisTokenManager.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-pdf/src/java/org/apache/nutch/parse/pdf/PdfParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/IndexSearcher.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/servlet/Cached.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseUtil.java
REPLACE /lucene/nutch/trunk/conf/mapred-default.xml.template
MODIFY /lucene/nutch/trunk/src/plugin/creativecommons/src/java/org/creativecommons/nutch/CCDeleteUnlicensedTool.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/util/Daemon.java
ADD /lucene/nutch/trunk/src/test/org/apache/nutch/util/WritableTestUtils.java
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/searcher/TestHitDetails.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Injector.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/HitDetails.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/analysis/NutchAnalysis.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseStatus.java
DEL /lucene/nutch/trunk/bin/stop-all.sh
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/OutlinkExtractor.java
MODIFY /lucene/nutch/trunk/src/plugin/creativecommons/src/test/org/creativecommons/nutch/TestCCParseFilter.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/DistributedSearch.java
MODIFY /lucene/nutch/trunk/src/plugin/creativecommons/src/java/org/creativecommons/nutch/CCIndexingFilter.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDb.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/HtmlParseFilters.java
MODIFY /lucene/nutch/trunk/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/Http.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/plugin/PluginManifestParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/Query.java
MODIFY /lucene/nutch/trunk/src/plugin/languageidentifier/src/test/org/apache/nutch/analysis/lang/TestLanguageIdentifier.java
MODIFY /lucene/nutch/trunk/conf/crawl-tool.xml
DEL /lucene/nutch/trunk/lib/jetty-5.1.4.jar
MODIFY /lucene/nutch/trunk/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java
MODIFY /lucene/nutch/trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpAuthenticationFactory.java
MODIFY /lucene/nutch/trunk/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang/LanguageIndexingFilter.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-mspowerpoint/src/java/org/apache/nutch/parse/mspowerpoint/PropertiesReaderListener.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/NutchBean.java
DEL /lucene/nutch/trunk/lib/jetty-5.1.4.LICENSE.txt
MODIFY /lucene/nutch/trunk/src/plugin/urlfilter-regex/src/java/org/apache/nutch/net/RegexURLFilter.java
DEL /lucene/nutch/trunk/src/test/org/apache/nutch/ipc
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/LinkDbReader.java
ADD /lucene/nutch/trunk/src/java/org/apache/nutch/util/NutchConfiguration.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/plugin/Extension.java
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/protocol/TestContent.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/analysis/NutchDocumentAnalyzer.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/FetchedSegments.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/net/URLFilters.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/Summarizer.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-rtf/src/java/org/apache/nutch/parse/rtf/RTFParseFactory.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexingFilter.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-mp3/src/java/org/apache/nutch/parse/mp3/MetadataCollector.java
MODIFY /lucene/nutch/trunk/src/plugin/query-more/src/java/org/apache/nutch/searcher/more/TypeQueryFilter.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-js/src/java/org/apache/nutch/parse/js/JSParseFilter.java
ADD /lucene/nutch/trunk/conf/hadoop-default.xml
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/plugin/TestPluginSystem.java
MODIFY /lucene/nutch/trunk/src/plugin/protocol-ftp/src/java/org/apache/nutch/protocol/ftp/FtpResponse.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexingFilters.java
MODIFY /lucene/nutch/trunk/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang/HTMLLanguageParser.java
DEL /lucene/nutch/trunk/src/test/org/apache/nutch/io
MODIFY /lucene/nutch/trunk/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/DOMContentUtils.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/QueryFilter.java
MODIFY /lucene/nutch/trunk/src/test/nutch-site.xml
MODIFY /lucene/nutch/trunk/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/RobotRulesParser.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-swf/src/java/org/apache/nutch/parse/swf/SWFParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java
DEL /lucene/nutch/trunk/src/test/org/apache/nutch/mapred
DEL /lucene/nutch/trunk/lib/jetty-ext
MODIFY /lucene/nutch/trunk/bin/nutch
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/plugin/SimpleTestPlugin.java
MODIFY /lucene/nutch/trunk/src/plugin/ontology/src/test/org/apache/nutch/ontology/TestOntology.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseData.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-mspowerpoint/src/java/org/apache/nutch/parse/mspowerpoint/MSPowerPointParser.java
DEL /lucene/nutch/trunk/src/test/org/apache/nutch/ndfs
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDbReducer.java
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/searcher/TestQuery.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/segment/SegmentReader.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/Hits.java
MODIFY /lucene/nutch/trunk/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/util/NutchConf.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/net/URLFilter.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/util/LogFormatter.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/TextProfileSignature.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/DeleteDuplicates.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseOutputFormat.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/QueryFilters.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/util/NutchConfigurable.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/LuceneQueryOptimizer.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext/TestExtParser.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-zip/src/test/org/apache/nutch/parse/zip/TestZipParser.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/fs
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/protocol/TestContentProperties.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/FetcherOutput.java
MODIFY /lucene/nutch/trunk/src/plugin/ontology/src/java/org/apache/nutch/ontology/OwlParser.java
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/parse/TestParseData.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-mspowerpoint/src/java/org/apache/nutch/parse/mspowerpoint/PPTExtractor.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/plugin/PluginDescriptor.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/util/Progress.java
MODIFY /lucene/nutch/trunk/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang/LanguageIdentifier.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-rss/src/test/org/apache/nutch/parse/rss/TestRSSParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParsePluginsReader.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/plugin/PluginRepository.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-html/src/test/org/apache/nutch/parse/html/TestDOMContentUtils.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/FetcherOutputFormat.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/util/ThreadPool.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/util/NutchConfigured.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Signature.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-rtf/src/test/org/apache/nutch/parse/rtf/TestRTFParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/protocol/Protocol.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/net/RegexUrlNormalizer.java
DEL /lucene/nutch/trunk/bin/nutch-daemons.sh
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/ontology/OntologyFactory.java
MODIFY /lucene/nutch/trunk/src/plugin/protocol-file/src/java/org/apache/nutch/protocol/file/File.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/parse/HtmlParseFilter.java
MODIFY /lucene/nutch/trunk/src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java
DEL /lucene/nutch/trunk/src/java/org/apache/nutch/ipc
MODIFY /lucene/nutch/trunk/src/plugin/ontology/src/java/org/apache/nutch/ontology/OntologyImpl.java
MODIFY /lucene/nutch/trunk/src/test/org/apache/nutch/analysis/TestQueryParser.java
MODIFY /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDbReader.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip/ZipTextExtractor.java
MODIFY /lucene/nutch/trunk/src/plugin/parse-pdf/src/test/org/apache/nutch/parse/pdf/TestPdfParser.java
MODIFY /lucene/nutch/trunk/src/plugin/creativecommons/src/java/org/creativecommons/nutch/CCQueryFilter.java