|
Giri, any updates on this patch? It would be really nice to get people to resolver=local to do development across the common, hdfs and mapreduce sub-projects.
A few comments after playing with this:
> I was having problems getting the build the use the version of Ivy it downloads.
Avro handles this better: http://svn.apache.org/viewvc/hadoop/avro/trunk/build.xml?view=annotate#l128 Ivy's jar is stored in the lib/ directory. If the specified version of Ivy isn't there, it removes all versions before downloading. Also, one need never specify offline=true, since, so long as the specified version is there, it doesn't contact the network. Giri's had too many problems trying to publish Hadoop jars to the Apache Maven repo using Ivy. He's now going to change direction and try the Maven Ant tasks and individual POM files (one for every jar file) to replace the Ivy functionality. If this works, he'll propagate the work to the other Hadoop subprojects. As part of this, the maven-ant-tasks.jar would be checked into the lib directory.
I'd be curious to hear what the problems are, and what it means to publish using Ivy. I publish Avro's jar with scp.
I was able to use ivy to publish ivy.xml and hadoop jars to a local filesystem using filesystem resolver ,scp resolver to people.apache.org:/home/<myhome>
And everyone had concerns about publishing on to the home folder including me using ivy scp resolver. I tried using the respository.apache.org maven repository (nexus) for publishing ivy artifacts (ivy.xml and hadoop.jar) I coudnt do the publishing even to the snapshot repository. I get the forbidden error.(I verified the authentication as my userid has access to the maven repo.) When I tried doing the same publishing to a local nexus instance I was able to publish it. Now I'm trying out with maven ant task. This patch uses mvn-ant-task for publishing artifacts to local filesystem and to the apache-snapshots repo.
Instructions: apply the common-trunk.patch To publish hdfs jar to the local filesystem repo by resolving commons jar from local filesystem repo. To publish mapred jar to the local filesystem repo by resolving common/hdfs jar from local filesystem repo. common/hdfs/mapred artifacts are already published to the apache repository. If you want to just use the artifacts from the apache repo, you can just omit the -Dresolvers argument. IMPORTANT If you want to switch between the internal and apache snapshot repository you are expected to call the ant clean-cache target and then set the resolvers to internal or apache snapshot repository as mentioned above. Common patch builds fine. ISSUES OPEN: MAPRED: HDFS: I need help to debug and fix or we can open up separate jira's to address those failures. mapreduce-trunk-v2.patch fixes the sqoop test failure as well, thanks to Aaron for the debugging tip.
attached hdfs-trunk-v2.patch and mapreduce-trunk-v3.patch which would help in resolving artifacts from the local fs
when -Dresolvers=internal is passed. If not found would resolve from apache-snaphots rather failing the build. tnx uploaded v4 version of patch which works with the latest trunk
tnx I tested this out. I did
1. Made change to a common file 2. Compiled and published it locally using ant mvn-install 3. Made some changes to mapreduce source to use the change I did in the common file 4. Compiled using ant -Dresolvers=internal The compilation went through fine. Tested mapred with and without -Dresolvers=internal. Verified that ~/.iv2 cache gets populated from the local ~/.m2 repository if -Dresolvers=internal is passed AND local repository is present. Otherwise it downloads from apache repository.
nit: It would be good to add description to the newly added targets so that those can be printed in "ant -p" help. Looked at the patch overall and tested it. Works fine when connected.
One major point : the patch assumes upfront that the dependency order is common->hdfs->mapred. There is another alternative : common->mapred and common->hdfs. In many local discussions, I didn't hear one final conclusion regarding this. Please also see HDFS-641. The first approach pushes some hdfs specific tests/benchmarks into mapred which may or may not be very correct. The second one still leaves the location of these tests/benchmarks unanswered. What does the community think about this? Your failures w.r.t run-test-hdfs-with-mr and others should be connected to the above point, I guess. As for the patch review comments, I didn't go through each and every line so many nits may be missing the review.
ivy doesnt work offline. Everytime we do a build whether the dependencies are present in the cache or not it goes and verifies the repo. If the dependencies are present locally it doesn't download. Same is the case with mvn-ant-task.jar. It doesnt download the jar everytime as usetimestamp is set to true.
When dependencies are put in a single line the ivy.xml file looks refined and re-formatting would greatly help in understanding.
This patch uses maven and ivy for publishing and resolving resp. Ivy work's on configuration while maven works on scope. I 've tried my best to utilize best of both the worlds.
Until last couple of days hdfs depended on both mapred and common. And mapred depended on hdfs and common. Hence we had a situation to publish only mapred and hdfs jar and not the corresponding test jars. I didn't want to re-use the mvn-install-mapred target as I was expected to cleanup this target once the circular dependency issue is resolved.
That would be quite a work and I would defn. want that to be in a diff jira.
Its not just the jar files that the cache stores, it also converts the poms and stores them as ivy.xml files for different ivy configurations. And the best way to clean them up is to clean the corresponding artifact folder in the cache.
When I call ant clean I would defn. expect a clean workspace. Thanks for the comments. uploaded patch which address comments from shared and vinod.
Looks like these 3 files got dropped in the V7 patch for common. I think they're still needed.
hadoop-core-template.xml common-tunk-v8.patch fixes the missing pom files
The patch works overall.
It works like that on trunk. After the first run, I can go offline and still do my work. I think it works this way because we specify particular versioned jars, and so ivy actually doesn't go to the repo everytime. This might change if we wish to use snapshot jars of common/mapred/hdfs.
Created MAPREDUCE-1101 for the same.
Then we may wish to clean the ivy.jar too when we do ant clean. Also, as Giri already has mentioned, we will need a follow up issue to clean up the list of dependencies, particularly of the contrib projects. -1 to deleting downloaded dependencies on 'ant clean'; if you're working offline, there are plenty of times you want to clean your own build intermediates, but don't want to inadvertently blow away your entire ability to compile. Maybe a separate 'depclean' target should make things Really, Really Clean.
I agree with Aaron. Let's not wipe the downloaded jars on 'ant clean'. I'd propose 'veryclean' since that matches the C usage.
patch v9 which incorporates offline and veryclean
updated mapred v 9.1 patch which applies to the current trunk, thanks to Lee for testing the patch with the latest trunk.
I'm changing -0.22.0-dev-SNAPSHOT to -0.22.0-SNAPSHOT.
How do we publish the jars to Apache's maven repository? Integrated in Hadoop-Common-trunk-Commit #73 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/73/
. Use Maven ant tasks to publish the subproject jars. (Giridharan Kesavan via omalley) Integrated in Hadoop-Common-trunk-Commit #74 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/74/
. Remove generated files from subversion. Ok, I've committed it to common.
It broke HDFS once I deleted the lib/hadoop-core-*.jar files correctly. I reverted it so that trunk isn't broken for HDFS. I have an interview and meeting this afternoon, but will try to fix it after that. Integrated in Hadoop-Hdfs-trunk-Commit #88 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/88/
. Revert changes because the new raid contrib module broke this patch. (I missed this because I forgot to delete the lib/hadoop-core-*.jar files in my testing.)
mvn deploy as hudson user on any of the apache build server would publish jars to the Apache mvn respo. Integrated in Hadoop-Common-trunk #143 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/143/
. Remove generated files from subversion. . Use Maven ant tasks to publish the subproject jars. (Giridharan Kesavan via omalley) It seems like this has resulted in broken HDFS trunk.
Integrated in Hadoop-Hdfs-trunk-Commit #96 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/96/
uploaded mapred-trunk.v9.2 version of patch that also addresses publishing artifacts to snapshots and to the staging repo.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
To publish common's jar's to the local repository : ie : /home/<username>/ivyrepo
cd common-trunk
apply common-trunk.patch
ant ivy-publish-local
this would publish hadoop-core and hadoop-core-test jar to the local filesystem based repository.
cd hdfs-trunk
apply hdfs-trunk.patch
ant ivy-publish-local -Dresolver=local
this would publish hdfs jars to the local filesystem based repository
-Dresolver=local option tells ivy to resolve the common jars from the local filesystem based repository
cd mapreduce-trunk
apply mapreduce-trunk.patch
ant ivy-publish-local -Dresolver=local
this would publish mapred jars to the local filesystem based repository
-Dresolver=local option tells ivy to resolve the common and hdfs jars from the local filesystem based repository
this patch also has a ssh based resolver that publishes artifacts to the people server's home folder but that requires authentication.