|
[
Permlink
| « Hide
]
Doug Cutting added a comment - 01/Feb/06 02:55 AM
Link to the related Nutch issue.
NDFS, the Nutch Distributed Filesystem will be renamed HDFS, the Hadoop Distributed Filesystem. Its code will live in the package org.apache.nutch.dfs, and its fs implementation class will be named DistributedFileSystem.
What timeframe did you have in mind? There are a few patches in the queue, which will be affected by this split.
Other than that - emphatic yes! +1
I quess the fuse-j - ndfs work from John/me could be part of hadoop /contrib after this change? Andrzej: I'd like to do this soon, this week or next. No matter how long I wait, there will probably always be a few patches queued that will need to be updated. But hopefully we can avoid large patches like
Sami: yes, the fuse stuff would then make a great hadoop contrib package. I assume Doug meant org.apache.hadoop.dfs, not org.apache.nutch.dfs.
Ok, the sooner the better from my POV. I didn;t have anything in mind that would be included in Hadoop, rather Nutch patches that I'm working on. Affected patches include some of the recent larger ones: the adaptive fetch schedule thing and crawl metadata. No big deal, but we need to know what to shoot for.
Otis: yes, thanks, I meant org.apache.hadoop.dfs.
Andrzej: I'm awaiting Mike's commit of NUTCH-183, which should happen today. I'll then try to make the split tomorrow. The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria. Kids are good at generating such. Googol is a kid's term.
Okay, I've moved the code from Nutch to Hadoop. Now I need to repair Nutch so that it still works!
One remaining problem is the need to separate nutch config files from hadoop config files. There's now a hadoop-default.xml and hadoop-site.xml, which are separate from the similarly-named nutch files. For now, I'll fix this by adding the following methods to Hadoop's Configuration class: void addDefaultResource(String name); Then add a Nutch utility class like: public class NutchConfiguration { Then all of the places which currently call 'new NutchConf()' can be replaced with 'NutchConfiguration().create()'. Longer-term we might consider a more radical re-design of the configuration API. But first we need to get Hadoop and Nutch split. I just committed this. Phew!
It should be noted that the name "Nutch" also comes from one of Doug's children. closing issues for released versions
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||