|
Prototype hadoop-0.16.2 POM. this should be reviewed and we can consider whether to publish this to the central repository.
Some notes > -having a property file driving version numbering of all artifacts
Lucene does this by using the version property from build.xml, so that we don't have to maintain another version file. > -public releases only: sticking this POM file up on people.apache.org in the right place, along with the JAR and some .md5 checksums If these are officially released artifacts, can't we just post them with the release, to www.apache.org/dist/hadoop/core? Why do we need to alter our distribution mechanism for Maven? >> -having a property file driving version numbering of all artifacts
>Lucene does this by using the version property from build.xml, so that we don't have to maintain another version file. Yes, hadoop should do that to. Even so you need another file to drive the versions of all your dependencies, those that are currently encoded in the filenames in /lib (jetty, log4j) or not documented at all (servlet-api.jar) >> -public releases only: sticking this POM file up on people.apache.org in the right place, along with the JAR and some .md5 checksums >If these are officially released artifacts, can't we just post them with the release, to www.apache.org/dist/hadoop/core? Why do we need to alter our distribution mechanism for Maven? Its something that could be done on the side (in a separate build.xml), which takes the signed off release artifacts and scps them them up to people.apache.org. The repository police do check that the JARs put up are officially released, although they dont audit the POMs so thoroughly This is a zip file containing nearly everything needed to
-pull in all the hadoop-core dependencies from Ivy -publish the built file to a local ivy repository -generate maven2-compatible JAR and POM, both with MD5 signatures It doesnt make any changes to the existing build; there is a new file ivybuild.xml that lives alongside it to do ivy work. I'm publishing this for people who want to integrate hadoop builds with local Ivy builds, and to start a process of sticking hadoop artifacts up on the apache repositories. It also shows that Ivy can be used to set up Hadoop's classpath, but doesnt make a strong case for actually doing so What is useful for many other projects is to put the hadoop-core artifacts into the maven repository, starting with the snapshot. That could be done using a small subset of what we have here, though there's still the problem of no commons-cli, which the command line tools use. This patch
rmlib.sh is the supporting script to cleanup the jar file from the lib folder. With this patch we should be able to use ivy to resolve dependencies through the maven repository, and we can get rid of the local lib folder when all the dependencies all available on the m2 repository. As Steve mentioned there are certain missing libraries in the m2 repo. Those missing dependencies are still added to the classpath from the local lib folder, and other dependencies that are available in the m2 repository are resolved/retrieved by IVY from the maven repository. Here I 've provided the list of missing dependencies for different component. Hadoop-core Thrift Chukwa This patch contains a new set of ivybuild.xml & ivy.xml files for diff comps Also this patch contains a top level ivysettings.xml file which has the details of all the resolvers and the url for resolving the dependencies.
Apply the patch As of now the patch supports 3 main targets *compile To execute the targets
This has compile and package targets as dependencies. Work in progress for other ant targets. Meanwhile I would like to get this patch reviewed. Thanks, Thanks Giri!
Steve, can you review this? Does this look like it's on the right track? This is what I'm working on right now; it publishes the core hadoop artifacts to the local repository, where my other build can grab them. I also take the new artifacts and stick them in an SVN-managed repository so that our hudson build runs against the releases I make, with some work needed to make sure that my local build doesn't pick up those (usually dated) artifacts, but instead stays up to date with whatever I build.
The versions I have for things are commons-cli.version=2.0-SNAPSHOT This hasn't migrated to jetty6 yet, and it used an early release of Ivy; it should work with the latest official release now. commons-cli versions is a problem -there hasn't been an official release there for a while. The solution there is to persuade them to release one. I've had a quick flick through and I'm impressed by the effort that Giri has gone to here. This looks like the basis for moving everything in the core to ivy
1. It would be good to have everything driven by a single master properties file, with each project having the option to override them (via their own libraries.properties) but not requiring them to. This makes it easy to push up every app to a new version of, say, log4j, by changing one file, and it keeps things consistent. Giri's patch shows how inconsistent the projects are regarding versions of things, and that just leads to trouble down the line 2. Does Chukwa depend on hadoop-core? If it does, there's a good case for the ivy config file of hadoop-core to contain some specific configurations for jetty and jsp support, so that Chukwa can pull them in without having to repeat them. This is what we do in smartfrog by cross-referencing component packages: only one package is allowed to import a third party library; everything else has to depend on that package. It works very well when you move to RPM distribution as the same dependencies and ownership rules apply there. 3. We'd need to go through the ivy reports of everything and make sure that nothing is pulling in transient dependencies you don't want. If they pull in transients you do want, it is safer to declare them and the version you desire. commons-logging is a notorious source of problems here; you should only ever depend on its "master" version to avoid stuff you dont need like avalon-logkit and bits of JMX. 4. src/contrib/hdfsproxy/ivybuild.xml has hard coded version numbers in the build file. It shoud be driven from the .properties file I need to play with this some more, by patching a clean version of the source tree and seeing how it goes. Its a good design, there's just a few more tweaks we need to get in there Thanks Steve for your comments!
Here is the approach to address point 1 We have the top level libraries.properties and component level libraries.properties file. Other thing that I'm not sure is "How to decide on the version of components that doesn't have the version # as part of their jar name?" For example servlet-api.jar inside the chukwa/lib folder doesn't seem to have a version. This is one such example , we have lot more like this. We have different dependencies inside the Chukwa/lib which doesn't seem to have a version # as well. To answer the 2 point I would address comment 1, 3 and 4 in my next patch. Thanks again for your comments. >Other thing that I'm not sure is "How to decide on the version of components that doesn't have the version # as part of their jar name?"
Sticking the checksums into google works most reliably. Giri, here's the core ivy settings updated to Jetty6 and the last ivy rc out the door. There is another Ivy release being voted on today, but I haven't used that yet.
I think This patch comprises comments suggested by Steve. This patch also has fixes for most of the ant targets except for forrest. as forest jar is yet to be published.
Please review the patch, Im also supplying a supporting script for removing the jars files which would be resolved by ivy. Thanks again to Steve for his comments. Giri Shouldn't this replace build.xml, not add ivybuild.xml?
v3 version of patch comprises
Thanks, -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12396071/rmlib-v3.sh against trunk revision 726129. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3744/console This message is automatically generated. I'm resubmitting the patch
Thanks, Giri Here is the final version of the patch which enables ivy for most of the targets.
except for findbugs and the target which uses forrest for doc generation. Thanks, I just committed this, with two changes:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
http://people.apache.org/~kalle/mahout/maven2/org/apache/hadoop/core/0.17.0-SNAPSHOT/core-0.17.0-20080315.201857-1.pom
this declares an explicit dependency on most of hadoop-core/lib, and does not indicate which is optional. It is not ideal.