Hadoop Common
  1. Hadoop Common
  2. HADOOP-5107

split the core, hdfs, and mapred jars from each other and publish them independently to the Maven repository

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.0
    • Fix Version/s: 0.21.0
    • Component/s: build
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I think to support splitting the projects, we should publish the jars for 0.20.0 as independent jars to the Maven repository

      1. common-trunk.patch
        18 kB
        Giridharan Kesavan
      2. common-trunk-v1.patch
        26 kB
        Giridharan Kesavan
      3. common-trunk-v4.patch
        26 kB
        Giridharan Kesavan
      4. common-trunk-v6.patch
        39 kB
        Giridharan Kesavan
      5. common-trunk-v7.patch
        24 kB
        Giridharan Kesavan
      6. common-trunk-v8.patch
        39 kB
        Giridharan Kesavan
      7. common-trunk-v9.patch
        31 kB
        Giridharan Kesavan
      8. hadoop-hdfsd-v4.patch
        42 kB
        Giridharan Kesavan
      9. hdfs-trunk.patch
        21 kB
        Giridharan Kesavan
      10. hdfs-trunk-v1.patch
        45 kB
        Giridharan Kesavan
      11. hdfs-trunk-v2.patch
        46 kB
        Giridharan Kesavan
      12. hdfs-trunk-v6.patch
        34 kB
        Giridharan Kesavan
      13. hdfs-trunk-v9.patch
        42 kB
        Giridharan Kesavan
      14. mapred-trunk-v1.patch
        70 kB
        Giridharan Kesavan
      15. mapred-trunk-v2.patch
        55 kB
        Giridharan Kesavan
      16. mapred-trunk-v3.patch
        55 kB
        Giridharan Kesavan
      17. mapred-trunk-v4.patch
        74 kB
        Giridharan Kesavan
      18. mapred-trunk-v5.patch
        73 kB
        Giridharan Kesavan
      19. mapred-trunk-v6.patch
        72 kB
        Giridharan Kesavan
      20. mapred-trunk-v9.1.patch
        72 kB
        Giridharan Kesavan
      21. mapred-trunk-v9.2.patch
        72 kB
        Giridharan Kesavan
      22. mapred-trunk-v9.patch
        74 kB
        Giridharan Kesavan
      23. mapreduce-trunk.patch
        38 kB
        Giridharan Kesavan

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #162 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/162/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #162 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/162/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #134 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/134/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #134 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/134/ )
          Hide
          Owen O'Malley added a comment -

          I just committed this. Thanks, Giri!

          Show
          Owen O'Malley added a comment - I just committed this. Thanks, Giri!
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #72 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/72/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #72 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/72/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #136 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/136/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #136 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/136/ )
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #100 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/100/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #100 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/100/ )
          Hide
          Giridharan Kesavan added a comment -

          uploaded mapred-trunk.v9.2 version of patch that also addresses publishing artifacts to snapshots and to the staging repo.

          Show
          Giridharan Kesavan added a comment - uploaded mapred-trunk.v9.2 version of patch that also addresses publishing artifacts to snapshots and to the staging repo.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #96 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/96/)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #96 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/96/ )
          Hide
          Konstantin Boudnik added a comment -

          It seems like this has resulted in broken HDFS trunk.

          Show
          Konstantin Boudnik added a comment - It seems like this has resulted in broken HDFS trunk.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk #143 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/143/)
          . Remove generated files from subversion.
          . Use Maven ant tasks to publish the subproject jars.
          (Giridharan Kesavan via omalley)

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk #143 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/143/ ) . Remove generated files from subversion. . Use Maven ant tasks to publish the subproject jars. (Giridharan Kesavan via omalley)
          Hide
          Giridharan Kesavan added a comment -

          How do we publish the jars to Apache's maven repository?

          mvn deploy as hudson user on any of the apache build server would publish jars to the Apache mvn respo.

          Show
          Giridharan Kesavan added a comment - How do we publish the jars to Apache's maven repository? mvn deploy as hudson user on any of the apache build server would publish jars to the Apache mvn respo.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #88 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/88/)
          . Revert changes because the new raid contrib module broke this
          patch. (I missed this because I forgot to delete the lib/hadoop-core-*.jar
          files in my testing.)

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #88 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/88/ ) . Revert changes because the new raid contrib module broke this patch. (I missed this because I forgot to delete the lib/hadoop-core-*.jar files in my testing.)
          Hide
          Owen O'Malley added a comment -

          Ok, I've committed it to common.

          It broke HDFS once I deleted the lib/hadoop-core-*.jar files correctly. I reverted it so that trunk isn't broken for HDFS. I have an interview and meeting this afternoon, but will try to fix it after that.

          Show
          Owen O'Malley added a comment - Ok, I've committed it to common. It broke HDFS once I deleted the lib/hadoop-core-*.jar files correctly. I reverted it so that trunk isn't broken for HDFS. I have an interview and meeting this afternoon, but will try to fix it after that.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #74 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/74/)
          . Remove generated files from subversion.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #74 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/74/ ) . Remove generated files from subversion.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #73 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/73/)
          . Use Maven ant tasks to publish the subproject jars.
          (Giridharan Kesavan via omalley)

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #73 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/73/ ) . Use Maven ant tasks to publish the subproject jars. (Giridharan Kesavan via omalley)
          Hide
          Owen O'Malley added a comment -

          I'm changing -0.22.0-dev-SNAPSHOT to -0.22.0-SNAPSHOT.

          How do we publish the jars to Apache's maven repository?

          Show
          Owen O'Malley added a comment - I'm changing -0.22.0-dev-SNAPSHOT to -0.22.0-SNAPSHOT. How do we publish the jars to Apache's maven repository?
          Hide
          Giridharan Kesavan added a comment -

          updated mapred v 9.1 patch which applies to the current trunk, thanks to Lee for testing the patch with the latest trunk.

          Show
          Giridharan Kesavan added a comment - updated mapred v 9.1 patch which applies to the current trunk, thanks to Lee for testing the patch with the latest trunk.
          Hide
          Giridharan Kesavan added a comment -

          patch v9 which incorporates offline and veryclean

          Show
          Giridharan Kesavan added a comment - patch v9 which incorporates offline and veryclean
          Hide
          Owen O'Malley added a comment -

          I agree with Aaron. Let's not wipe the downloaded jars on 'ant clean'. I'd propose 'veryclean' since that matches the C usage.

          Show
          Owen O'Malley added a comment - I agree with Aaron. Let's not wipe the downloaded jars on 'ant clean'. I'd propose 'veryclean' since that matches the C usage.
          Hide
          Aaron Kimball added a comment -

          -1 to deleting downloaded dependencies on 'ant clean'; if you're working offline, there are plenty of times you want to clean your own build intermediates, but don't want to inadvertently blow away your entire ability to compile. Maybe a separate 'depclean' target should make things Really, Really Clean.

          Show
          Aaron Kimball added a comment - -1 to deleting downloaded dependencies on 'ant clean'; if you're working offline, there are plenty of times you want to clean your own build intermediates, but don't want to inadvertently blow away your entire ability to compile. Maybe a separate 'depclean' target should make things Really, Really Clean.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          The patch works overall.

          ivy doesnt work offline. Everytime we do a build whether the dependencies are present in the cache or not it goes and verifies the repo. If the dependencies are present locally it doesn't download. Same is the case with mvn-ant-task.jar. It doesnt download the jar everytime as usetimestamp is set to true.

          It works like that on trunk. After the first run, I can go offline and still do my work. I think it works this way because we specify particular versioned jars, and so ivy actually doesn't go to the repo everytime. This might change if we wish to use snapshot jars of common/mapred/hdfs.

          > common project: Should we take this as an opportunity and rename the core jar to common jar before publishing? It looks odd the project name is common while the jar's name refers to core.
          >>>> That would be quite a work and I would defn. want that to be in a diff jira.

          Created MAPREDUCE-1101 for the same.

          > Should `ant clean` delete maven-ant-tasks.jar every time? I guess not.
          >>>> When I call ant clean I would defn. expect a clean workspace.
          Also there is a different reason. I ve seen ppl doing a ctrl-c half way when the ivy/maven-ant-task. jar is downloading. So the jar is partially downloaded. Next time when a user runs the build and the build fails for the jar file being corrupt, they have to go delete them manually.

          Then we may wish to clean the ivy.jar too when we do ant clean.

          Also, as Giri already has mentioned, we will need a follow up issue to clean up the list of dependencies, particularly of the contrib projects.
          In any case, this issue is still blocked on the whole common, hdfs, mapred dependency related issues. Just putting these comments, so we are ready.

          Show
          Vinod Kumar Vavilapalli added a comment - The patch works overall. ivy doesnt work offline. Everytime we do a build whether the dependencies are present in the cache or not it goes and verifies the repo. If the dependencies are present locally it doesn't download. Same is the case with mvn-ant-task.jar. It doesnt download the jar everytime as usetimestamp is set to true. It works like that on trunk. After the first run, I can go offline and still do my work. I think it works this way because we specify particular versioned jars, and so ivy actually doesn't go to the repo everytime. This might change if we wish to use snapshot jars of common/mapred/hdfs. > common project: Should we take this as an opportunity and rename the core jar to common jar before publishing? It looks odd the project name is common while the jar's name refers to core. >>>> That would be quite a work and I would defn. want that to be in a diff jira. Created MAPREDUCE-1101 for the same. > Should `ant clean` delete maven-ant-tasks.jar every time? I guess not. >>>> When I call ant clean I would defn. expect a clean workspace. Also there is a different reason. I ve seen ppl doing a ctrl-c half way when the ivy/maven-ant-task. jar is downloading. So the jar is partially downloaded. Next time when a user runs the build and the build fails for the jar file being corrupt, they have to go delete them manually. Then we may wish to clean the ivy.jar too when we do ant clean. Also, as Giri already has mentioned, we will need a follow up issue to clean up the list of dependencies, particularly of the contrib projects. In any case, this issue is still blocked on the whole common, hdfs, mapred dependency related issues. Just putting these comments, so we are ready.
          Hide
          Giridharan Kesavan added a comment -

          common-tunk-v8.patch fixes the missing pom files

          Show
          Giridharan Kesavan added a comment - common-tunk-v8.patch fixes the missing pom files
          Hide
          Lee Tucker added a comment -

          Looks like these 3 files got dropped in the V7 patch for common. I think they're still needed.

          hadoop-core-template.xml
          hadoop-core-test-template.xml
          hadoop-core.pom

          Show
          Lee Tucker added a comment - Looks like these 3 files got dropped in the V7 patch for common. I think they're still needed. hadoop-core-template.xml hadoop-core-test-template.xml hadoop-core.pom
          Hide
          Giridharan Kesavan added a comment -

          uploaded patch which address comments from shared and vinod.

          Show
          Giridharan Kesavan added a comment - uploaded patch which address comments from shared and vinod.
          Hide
          Giridharan Kesavan added a comment -

          The patch doesn't work when we go off-line for subsequent runs. The off-line feature is missing in all the projects. Without this feature, it tries to download maven-ant-tasks.jar itself again and gets stuck.

          ivy doesnt work offline. Everytime we do a build whether the dependencies are present in the cache or not it goes and verifies the repo. If the dependencies are present locally it doesn't download. Same is the case with mvn-ant-task.jar. It doesnt download the jar everytime as usetimestamp is set to true.

          In many files, in particular the ivy.xml files of contrib projects, most of the changes are not required and are redundant as the patch removes them and simply adds them again changing the format into a single line. Undoing these changes will greatly reduce the patch size

          When dependencies are put in a single line the ivy.xml file looks refined and re-formatting would greatly help in understanding.

          In mapreduce and hdfs ivy.xml files, some cleanup is done. The earlier client and server specific dependencies looked good and natural too. Did you remove that because the classification was premature or it didn't gel well with your changes?

          This patch uses maven and ivy for publishing and resolving resp. Ivy work's on configuration while maven works on scope. I 've tried my best to utilize best of both the worlds.

          mapreduce build.xml: Do we need separate mvn-install and mvn-install-mapred? Even if it is needed, mvn-install should depend on mvn-install-mapred. A case of reuse.

          Until last couple of days hdfs depended on both mapred and common. And mapred depended on hdfs and common. Hence we had a situation to publish only mapred and hdfs jar and not the corresponding test jars. I didn't want to re-use the mvn-install-mapred target as I was expected to cleanup this target once the circular dependency issue is resolved.

          common project: Should we take this as an opportunity and rename the core jar to common jar before publishing? It looks odd the project name is common while the jar's name refers to core.

          That would be quite a work and I would defn. want that to be in a diff jira.

          I think that in both mapred and hdfs, clean-cache should not delete the whole ${user.home}/.ivy2/cache/org.apache.hadoop/hadoop-core directory for example. It works for now, but different projects may work with different versions of the jar, so mapred's clean-cache should only delete the corresponding version of the jar. Same with the other directories in the cache. Thoughts?

          Its not just the jar files that the cache stores, it also converts the poms and stores them as ivy.xml files for different ivy configurations. And the best way to clean them up is to clean the corresponding artifact folder in the cache.

          Should `ant clean` delete maven-ant-tasks.jar every time? I guess not.

          When I call ant clean I would defn. expect a clean workspace.
          Also there is a different reason. I ve seen ppl doing a ctrl-c half way when the ivy/maven-ant-task. jar is downloading. So the jar is partially downloaded. Next time when a user runs the build and the build fails for the jar file being corrupt, they have to go delete them manually.

          Thanks for the comments.

          Show
          Giridharan Kesavan added a comment - The patch doesn't work when we go off-line for subsequent runs. The off-line feature is missing in all the projects. Without this feature, it tries to download maven-ant-tasks.jar itself again and gets stuck. ivy doesnt work offline. Everytime we do a build whether the dependencies are present in the cache or not it goes and verifies the repo. If the dependencies are present locally it doesn't download. Same is the case with mvn-ant-task.jar. It doesnt download the jar everytime as usetimestamp is set to true. In many files, in particular the ivy.xml files of contrib projects, most of the changes are not required and are redundant as the patch removes them and simply adds them again changing the format into a single line. Undoing these changes will greatly reduce the patch size When dependencies are put in a single line the ivy.xml file looks refined and re-formatting would greatly help in understanding. In mapreduce and hdfs ivy.xml files, some cleanup is done. The earlier client and server specific dependencies looked good and natural too. Did you remove that because the classification was premature or it didn't gel well with your changes? This patch uses maven and ivy for publishing and resolving resp. Ivy work's on configuration while maven works on scope. I 've tried my best to utilize best of both the worlds. mapreduce build.xml: Do we need separate mvn-install and mvn-install-mapred? Even if it is needed, mvn-install should depend on mvn-install-mapred. A case of reuse. Until last couple of days hdfs depended on both mapred and common. And mapred depended on hdfs and common. Hence we had a situation to publish only mapred and hdfs jar and not the corresponding test jars. I didn't want to re-use the mvn-install-mapred target as I was expected to cleanup this target once the circular dependency issue is resolved. common project: Should we take this as an opportunity and rename the core jar to common jar before publishing? It looks odd the project name is common while the jar's name refers to core. That would be quite a work and I would defn. want that to be in a diff jira. I think that in both mapred and hdfs, clean-cache should not delete the whole ${user.home}/.ivy2/cache/org.apache.hadoop/hadoop-core directory for example. It works for now, but different projects may work with different versions of the jar, so mapred's clean-cache should only delete the corresponding version of the jar. Same with the other directories in the cache. Thoughts? Its not just the jar files that the cache stores, it also converts the poms and stores them as ivy.xml files for different ivy configurations. And the best way to clean them up is to clean the corresponding artifact folder in the cache. Should `ant clean` delete maven-ant-tasks.jar every time? I guess not. When I call ant clean I would defn. expect a clean workspace. Also there is a different reason. I ve seen ppl doing a ctrl-c half way when the ivy/maven-ant-task. jar is downloading. So the jar is partially downloaded. Next time when a user runs the build and the build fails for the jar file being corrupt, they have to go delete them manually. Thanks for the comments.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          As for the patch review comments, I didn't go through each and every line so many nits may be missing the review.

          • The patch doesn't work when we go off-line for subsequent runs. The off-line feature is missing in all the projects. Without this feature, it tries to download maven-ant-tasks.jar itself again and gets stuck.
          • Minor: Wrap around lines longer than 80 characters.
          • In many files, in particular the ivy.xml files of contrib projects, most of the changes are not required and are redundant as the patch removes them and simply adds them again changing the format into a single line. Undoing these changes will greatly reduce the patch size
          • In mapreduce and hdfs ivy.xml files, some cleanup is done. The earlier client and server specific dependencies looked good and natural too. Did you remove that because the classification was premature or it didn't gel well with your changes?
          • In all the project's build files, in setversion target, replaceregexp can be done in a single go for all the POMs. It takes a fileset and so doesn't need separate replaceregexp tasks.
          • Remove hadoop-core.pom file from common, it's no longer required.
          • Bump ivy.version to 2.1.0-rc1 in mapreduce and hdfs projects also? The patch bumps it for the common project.
          • mapreduce build.xml: Do we need separate mvn-install and mvn-install-mapred? Even if it is needed, mvn-install should depend on mvn-install-mapred. A case of reuse.
          • common project: Should we take this as an opportunity and rename the core jar to common jar before publishing? It looks odd the project name is common while the jar's name refers to core.
          • I think that in both mapred and hdfs, clean-cache should not delete the whole $ {user.home}

            /.ivy2/cache/org.apache.hadoop/hadoop-core directory for example. It works for now, but different projects may work with different versions of the jar, so mapred's clean-cache should only delete the corresponding version of the jar. Same with the other directories in the cache. Thoughts?

          • Should `ant clean` delete maven-ant-tasks.jar every time? I guess not.
          • Add the pom files in ivy directory (e.g. ivy/hadoop-mapred-examples.xml) to svn/git ignore ?
          • As Sharad already commented, can we put in nice descriptions for the new targets? Of course, we will not need these for internal only targets like mvn-taskdef.
          Show
          Vinod Kumar Vavilapalli added a comment - As for the patch review comments, I didn't go through each and every line so many nits may be missing the review. The patch doesn't work when we go off-line for subsequent runs. The off-line feature is missing in all the projects. Without this feature, it tries to download maven-ant-tasks.jar itself again and gets stuck. Minor: Wrap around lines longer than 80 characters. In many files, in particular the ivy.xml files of contrib projects, most of the changes are not required and are redundant as the patch removes them and simply adds them again changing the format into a single line. Undoing these changes will greatly reduce the patch size In mapreduce and hdfs ivy.xml files, some cleanup is done. The earlier client and server specific dependencies looked good and natural too. Did you remove that because the classification was premature or it didn't gel well with your changes? In all the project's build files, in setversion target, replaceregexp can be done in a single go for all the POMs. It takes a fileset and so doesn't need separate replaceregexp tasks. Remove hadoop-core.pom file from common, it's no longer required. Bump ivy.version to 2.1.0-rc1 in mapreduce and hdfs projects also? The patch bumps it for the common project. mapreduce build.xml: Do we need separate mvn-install and mvn-install-mapred? Even if it is needed, mvn-install should depend on mvn-install-mapred. A case of reuse. common project: Should we take this as an opportunity and rename the core jar to common jar before publishing? It looks odd the project name is common while the jar's name refers to core. I think that in both mapred and hdfs, clean-cache should not delete the whole $ {user.home} /.ivy2/cache/org.apache.hadoop/hadoop-core directory for example. It works for now, but different projects may work with different versions of the jar, so mapred's clean-cache should only delete the corresponding version of the jar. Same with the other directories in the cache. Thoughts? Should `ant clean` delete maven-ant-tasks.jar every time? I guess not. Add the pom files in ivy directory (e.g. ivy/hadoop-mapred-examples.xml) to svn/git ignore ? As Sharad already commented, can we put in nice descriptions for the new targets? Of course, we will not need these for internal only targets like mvn-taskdef.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Looked at the patch overall and tested it. Works fine when connected.

          One major point : the patch assumes upfront that the dependency order is common->hdfs->mapred. There is another alternative : common->mapred and common->hdfs. In many local discussions, I didn't hear one final conclusion regarding this. Please also see HDFS-641. The first approach pushes some hdfs specific tests/benchmarks into mapred which may or may not be very correct. The second one still leaves the location of these tests/benchmarks unanswered. What does the community think about this?

          Your failures w.r.t run-test-hdfs-with-mr and others should be connected to the above point, I guess.

          Show
          Vinod Kumar Vavilapalli added a comment - Looked at the patch overall and tested it. Works fine when connected. One major point : the patch assumes upfront that the dependency order is common->hdfs->mapred. There is another alternative : common->mapred and common->hdfs. In many local discussions, I didn't hear one final conclusion regarding this. Please also see HDFS-641 . The first approach pushes some hdfs specific tests/benchmarks into mapred which may or may not be very correct. The second one still leaves the location of these tests/benchmarks unanswered. What does the community think about this? Your failures w.r.t run-test-hdfs-with-mr and others should be connected to the above point, I guess.
          Hide
          Sharad Agarwal added a comment -

          Tested mapred with and without -Dresolvers=internal. Verified that ~/.iv2 cache gets populated from the local ~/.m2 repository if -Dresolvers=internal is passed AND local repository is present. Otherwise it downloads from apache repository.
          nit: It would be good to add description to the newly added targets so that those can be printed in "ant -p" help.

          Show
          Sharad Agarwal added a comment - Tested mapred with and without -Dresolvers=internal. Verified that ~/.iv2 cache gets populated from the local ~/.m2 repository if -Dresolvers=internal is passed AND local repository is present. Otherwise it downloads from apache repository. nit: It would be good to add description to the newly added targets so that those can be printed in "ant -p" help.
          Hide
          Giridharan Kesavan added a comment -

          uploaded mapred v5

          Show
          Giridharan Kesavan added a comment - uploaded mapred v5
          Hide
          Jothi Padmanabhan added a comment -

          I tested this out. I did
          1. Made change to a common file
          2. Compiled and published it locally using

           ant mvn-install 

          3. Made some changes to mapreduce source to use the change I did in the common file
          4. Compiled using

           ant -Dresolvers=internal 

          The compilation went through fine.

          Show
          Jothi Padmanabhan added a comment - I tested this out. I did 1. Made change to a common file 2. Compiled and published it locally using ant mvn-install 3. Made some changes to mapreduce source to use the change I did in the common file 4. Compiled using ant -Dresolvers=internal The compilation went through fine.
          Hide
          Giridharan Kesavan added a comment -

          uploaded v4 version of patch which works with the latest trunk
          tnx

          Show
          Giridharan Kesavan added a comment - uploaded v4 version of patch which works with the latest trunk tnx
          Hide
          Giridharan Kesavan added a comment -

          attached hdfs-trunk-v2.patch and mapreduce-trunk-v3.patch which would help in resolving artifacts from the local fs
          when -Dresolvers=internal is passed. If not found would resolve from apache-snaphots rather failing the build.
          tnx

          Show
          Giridharan Kesavan added a comment - attached hdfs-trunk-v2.patch and mapreduce-trunk-v3.patch which would help in resolving artifacts from the local fs when -Dresolvers=internal is passed. If not found would resolve from apache-snaphots rather failing the build. tnx
          Hide
          Giridharan Kesavan added a comment -

          mapreduce-trunk-v2.patch fixes the sqoop test failure as well, thanks to Aaron for the debugging tip.

          Show
          Giridharan Kesavan added a comment - mapreduce-trunk-v2.patch fixes the sqoop test failure as well, thanks to Aaron for the debugging tip.
          Hide
          Giridharan Kesavan added a comment -

          This patch uses mvn-ant-task for publishing artifacts to local filesystem and to the apache-snapshots repo.

          Instructions:
          To publish commons jar to the local filesystem repo.

          apply the common-trunk.patch
          ant mvn-install - this would publish common jars to the local filesystem based repo.

          To publish hdfs jar to the local filesystem repo by resolving commons jar from local filesystem repo.
          apply hdfs-trunk.patch
          ant mvn-install -Dresolvers=internal

          To publish mapred jar to the local filesystem repo by resolving common/hdfs jar from local filesystem repo.
          apply mapred-trunk.patch
          ant mvn-install -Dresolvers=internal

          common/hdfs/mapred artifacts are already published to the apache repository.

          If you want to just use the artifacts from the apache repo, you can just omit the -Dresolvers argument.
          By default ivy is configured to use the apache snapshot repository.

          IMPORTANT

          If you want to switch between the internal and apache snapshot repository you are expected to call the ant clean-cache target and then set the resolvers to internal or apache snapshot repository as mentioned above.

          Common patch builds fine.

          ISSUES OPEN:

          MAPRED:
          builds fine.
          sqoop contrib test failes for TestAutoProgressMapRunner - 2 Errors

          HDFS:
          run-test-hdfs-with-mr - 1 Failure with TestServiceLevelAuthorization
          run-test-hdfs-fault-inject - 13 Failures with TestFiDataTransferProtocol

          I need help to debug and fix or we can open up separate jira's to address those failures.

          Show
          Giridharan Kesavan added a comment - This patch uses mvn-ant-task for publishing artifacts to local filesystem and to the apache-snapshots repo. Instructions: To publish commons jar to the local filesystem repo. apply the common-trunk.patch ant mvn-install - this would publish common jars to the local filesystem based repo. To publish hdfs jar to the local filesystem repo by resolving commons jar from local filesystem repo. apply hdfs-trunk.patch ant mvn-install -Dresolvers=internal To publish mapred jar to the local filesystem repo by resolving common/hdfs jar from local filesystem repo. apply mapred-trunk.patch ant mvn-install -Dresolvers=internal common/hdfs/mapred artifacts are already published to the apache repository. If you want to just use the artifacts from the apache repo, you can just omit the -Dresolvers argument. By default ivy is configured to use the apache snapshot repository. IMPORTANT If you want to switch between the internal and apache snapshot repository you are expected to call the ant clean-cache target and then set the resolvers to internal or apache snapshot repository as mentioned above. Common patch builds fine. ISSUES OPEN: MAPRED: builds fine. sqoop contrib test failes for TestAutoProgressMapRunner - 2 Errors HDFS: run-test-hdfs-with-mr - 1 Failure with TestServiceLevelAuthorization run-test-hdfs-fault-inject - 13 Failures with TestFiDataTransferProtocol I need help to debug and fix or we can open up separate jira's to address those failures.
          Hide
          Giridharan Kesavan added a comment -

          I was able to use ivy to publish ivy.xml and hadoop jars to a local filesystem using filesystem resolver ,scp resolver to people.apache.org:/home/<myhome>

          And everyone had concerns about publishing on to the home folder including me using ivy scp resolver.

          I tried using the respository.apache.org maven repository (nexus) for publishing ivy artifacts (ivy.xml and hadoop.jar) I coudnt do the publishing even to the snapshot repository. I get the forbidden error.(I verified the authentication as my userid has access to the maven repo.)

          When I tried doing the same publishing to a local nexus instance I was able to publish it.
          After discussions with Brian (nexus repo admin), it looks like apache maven repo follows maven standards and he is not sure of publishing ivy.xml files to this repo.

          Now I'm trying out with maven ant task.

          Show
          Giridharan Kesavan added a comment - I was able to use ivy to publish ivy.xml and hadoop jars to a local filesystem using filesystem resolver ,scp resolver to people.apache.org:/home/<myhome> And everyone had concerns about publishing on to the home folder including me using ivy scp resolver. I tried using the respository.apache.org maven repository (nexus) for publishing ivy artifacts (ivy.xml and hadoop.jar) I coudnt do the publishing even to the snapshot repository. I get the forbidden error.(I verified the authentication as my userid has access to the maven repo.) When I tried doing the same publishing to a local nexus instance I was able to publish it. After discussions with Brian (nexus repo admin), it looks like apache maven repo follows maven standards and he is not sure of publishing ivy.xml files to this repo. Now I'm trying out with maven ant task.
          Hide
          Doug Cutting added a comment -

          I'd be curious to hear what the problems are, and what it means to publish using Ivy. I publish Avro's jar with scp.

          Show
          Doug Cutting added a comment - I'd be curious to hear what the problems are, and what it means to publish using Ivy. I publish Avro's jar with scp.
          Hide
          Nigel Daley added a comment -

          Giri's had too many problems trying to publish Hadoop jars to the Apache Maven repo using Ivy. He's now going to change direction and try the Maven Ant tasks and individual POM files (one for every jar file) to replace the Ivy functionality. If this works, he'll propagate the work to the other Hadoop subprojects. As part of this, the maven-ant-tasks.jar would be checked into the lib directory.

          Show
          Nigel Daley added a comment - Giri's had too many problems trying to publish Hadoop jars to the Apache Maven repo using Ivy. He's now going to change direction and try the Maven Ant tasks and individual POM files (one for every jar file) to replace the Ivy functionality. If this works, he'll propagate the work to the other Hadoop subprojects. As part of this, the maven-ant-tasks.jar would be checked into the lib directory.
          Hide
          Doug Cutting added a comment -

          > I was having problems getting the build the use the version of Ivy it downloads.

          Avro handles this better:

          http://svn.apache.org/viewvc/hadoop/avro/trunk/build.xml?view=annotate#l128

          Ivy's jar is stored in the lib/ directory. If the specified version of Ivy isn't there, it removes all versions before downloading. Also, one need never specify offline=true, since, so long as the specified version is there, it doesn't contact the network.

          Show
          Doug Cutting added a comment - > I was having problems getting the build the use the version of Ivy it downloads. Avro handles this better: http://svn.apache.org/viewvc/hadoop/avro/trunk/build.xml?view=annotate#l128 Ivy's jar is stored in the lib/ directory. If the specified version of Ivy isn't there, it removes all versions before downloading. Also, one need never specify offline=true, since, so long as the specified version is there, it doesn't contact the network.
          Hide
          Tom White added a comment -

          A few comments after playing with this:

          • With these changes we should be able to use Ivy's support of transitive dependencies, so (amongst other things) we could remove the jets3t dependency from HDFS. Cleaning up the direct dependencies probably belongs in a follow up issue.
          • I was having problems getting the build the use the version of Ivy it downloads. This was due to an older version in ~/.ant which was being picked up. It worked after I deleted it. (I just mention this in case others have this problem - no changes are required to the patches.)
          • I had to make a few changes to ivysettings.xml:
            • Define a maven2.pattern (it was having problems retrieving the RAT jar without it):
              <property name="maven2.pattern" value="[organisation]/[module]/[revision]/[module]-[revision]"/>
              
            • Define a default username (empty), since without it you get the error "The uri is in the wrong format" (for e.g. /home/$ {username}

              /ivyrepo/commons-cli/commons-cli/1.1/ivys/ivy.xml):

              <property name="username" value=""/>
              
          Show
          Tom White added a comment - A few comments after playing with this: With these changes we should be able to use Ivy's support of transitive dependencies, so (amongst other things) we could remove the jets3t dependency from HDFS. Cleaning up the direct dependencies probably belongs in a follow up issue. I was having problems getting the build the use the version of Ivy it downloads. This was due to an older version in ~/.ant which was being picked up. It worked after I deleted it. (I just mention this in case others have this problem - no changes are required to the patches.) I had to make a few changes to ivysettings.xml: Define a maven2.pattern (it was having problems retrieving the RAT jar without it): <property name= "maven2.pattern" value= "[organisation]/[module]/[revision]/[module]-[revision]" /> Define a default username (empty), since without it you get the error "The uri is in the wrong format" (for e.g. /home/$ {username} /ivyrepo/commons-cli/commons-cli/1.1/ivys/ivy.xml): <property name= "username" value=""/>
          Hide
          Arun C Murthy added a comment -

          Giri, any updates on this patch? It would be really nice to get people to resolver=local to do development across the common, hdfs and mapreduce sub-projects.

          Show
          Arun C Murthy added a comment - Giri, any updates on this patch? It would be really nice to get people to resolver=local to do development across the common, hdfs and mapreduce sub-projects.
          Hide
          Giridharan Kesavan added a comment -

          steps:

          To publish common's jar's to the local repository : ie : /home/<username>/ivyrepo
          cd common-trunk
          apply common-trunk.patch
          ant ivy-publish-local
          this would publish hadoop-core and hadoop-core-test jar to the local filesystem based repository.

          cd hdfs-trunk
          apply hdfs-trunk.patch
          ant ivy-publish-local -Dresolver=local
          this would publish hdfs jars to the local filesystem based repository
          -Dresolver=local option tells ivy to resolve the common jars from the local filesystem based repository

          cd mapreduce-trunk
          apply mapreduce-trunk.patch
          ant ivy-publish-local -Dresolver=local
          this would publish mapred jars to the local filesystem based repository
          -Dresolver=local option tells ivy to resolve the common and hdfs jars from the local filesystem based repository

          this patch also has a ssh based resolver that publishes artifacts to the people server's home folder but that requires authentication.

          Show
          Giridharan Kesavan added a comment - steps: To publish common's jar's to the local repository : ie : /home/<username>/ivyrepo cd common-trunk apply common-trunk.patch ant ivy-publish-local this would publish hadoop-core and hadoop-core-test jar to the local filesystem based repository. cd hdfs-trunk apply hdfs-trunk.patch ant ivy-publish-local -Dresolver=local this would publish hdfs jars to the local filesystem based repository -Dresolver=local option tells ivy to resolve the common jars from the local filesystem based repository cd mapreduce-trunk apply mapreduce-trunk.patch ant ivy-publish-local -Dresolver=local this would publish mapred jars to the local filesystem based repository -Dresolver=local option tells ivy to resolve the common and hdfs jars from the local filesystem based repository this patch also has a ssh based resolver that publishes artifacts to the people server's home folder but that requires authentication.

            People

            • Assignee:
              Giridharan Kesavan
              Reporter:
              Owen O'Malley
            • Votes:
              2 Vote for this issue
              Watchers:
              28 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development