Uploaded image for project: 'Bigtop'
  1. Bigtop
  2. BIGTOP-848

Allow to build stack on top of an arbitrary Hadoop SHA

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.5.0
    • Fix Version/s: 0.8.0
    • Component/s: general
    • Labels:
      None

      Description

      For the ease of experimenting and doing one-off stack builds, which was always the objective of the founding idea of this project, I would like to have a way to build a stack from an arbitrary SHA of Hadoop git.

      1. 0001-BIGTOP-848-git-support.patch
        2 kB
        Philip Herron
      2. BIGTOP-848.patch
        2 kB
        Konstantin Boudnik
      3. flume-spec.patch
        1 kB
        Andrew Purtell
      4. hadoop-spec.patch
        1 kB
        Andrew Purtell
      5. hbase-spec.patch
        0.7 kB
        Andrew Purtell
      6. hive-spec.patch
        5 kB
        Andrew Purtell
      7. pig-spec.patch
        0.8 kB
        Andrew Purtell
      8. zookeeper-spec.patch
        0.8 kB
        Andrew Purtell

        Issue Links

          Activity

          Hide
          rvs Roman Shaposhnik added a comment -

          This would be really nice to have for upcoming 0.6.0

          Show
          rvs Roman Shaposhnik added a comment - This would be really nice to have for upcoming 0.6.0
          Hide
          cos Konstantin Boudnik added a comment -

          We have some initial patch that is likely to be contributed shortly.

          Show
          cos Konstantin Boudnik added a comment - We have some initial patch that is likely to be contributed shortly.
          Hide
          redbrain Philip Herron added a comment -

          Trying to figure out how to submit my patches

          Show
          redbrain Philip Herron added a comment - Trying to figure out how to submit my patches
          Hide
          redbrain Philip Herron added a comment -

          Not 100% but the idea is there.

          Show
          redbrain Philip Herron added a comment - Not 100% but the idea is there.
          Hide
          bmahe Bruno Mahé added a comment -

          I have not tested but it seems about right. Thanks a lot for the patch!
          Some comments:

          • Please comment out the 3 new lines in bigtop.mk and change the git URL to the Apache git repo.
          • if $($(PKG)_GIT_LOC) is not defined, it should exit -1, not just exit.
          • Why not using the "git archive" which already does all the archiving for you?
          Show
          bmahe Bruno Mahé added a comment - I have not tested but it seems about right. Thanks a lot for the patch! Some comments: Please comment out the 3 new lines in bigtop.mk and change the git URL to the Apache git repo. if $($(PKG)_GIT_LOC) is not defined, it should exit -1, not just exit. Why not using the "git archive" which already does all the archiving for you?
          Hide
          cos Konstantin Boudnik added a comment -

          Also, I don't think a comment in the patch is a good way to express rants: you'd be better off by opening a jira ticket on that

          Show
          cos Konstantin Boudnik added a comment - Also, I don't think a comment in the patch is a good way to express rants: you'd be better off by opening a jira ticket on that
          Hide
          rvs Roman Shaposhnik added a comment -

          Philip Herron, one thing I was thinking about is that perhaps Bigtop's code can be smart enough to use the URI for something like this. IOW, git://foo.bar would make it checkout git, etc. The http://foo.bar is trickier since it can be both a file transfer ang git-over-http, etc.

          The comments in the patch seem to be misplaced – they need to be removed. While the use of make(1) is definitely not a fundamental concept for Bigtop it is a tool that works for a particular purpose. We'd love to hear proposal for a better tool. Please file JIRAs and attach patches.

          Show
          rvs Roman Shaposhnik added a comment - Philip Herron , one thing I was thinking about is that perhaps Bigtop's code can be smart enough to use the URI for something like this. IOW, git://foo.bar would make it checkout git, etc. The http://foo.bar is trickier since it can be both a file transfer ang git-over-http, etc. The comments in the patch seem to be misplaced – they need to be removed. While the use of make(1) is definitely not a fundamental concept for Bigtop it is a tool that works for a particular purpose. We'd love to hear proposal for a better tool. Please file JIRAs and attach patches.
          Hide
          cos Konstantin Boudnik added a comment -

          Roman Shaposhnik, is this really always the case the server supports "git:" schema? I am sure it might be one way or/and another. So, I'd rather leave the spec to the developer who's running the build than to some soulless script.

          Show
          cos Konstantin Boudnik added a comment - Roman Shaposhnik , is this really always the case the server supports "git:" schema? I am sure it might be one way or/and another. So, I'd rather leave the spec to the developer who's running the build than to some soulless script.
          Hide
          redbrain Philip Herron added a comment -

          Updating with a new patch this seems to work for me with make hadoop-rpm

          Show
          redbrain Philip Herron added a comment - Updating with a new patch this seems to work for me with make hadoop-rpm
          Hide
          bmahe Bruno Mahé added a comment - - edited

          Thanks a lot for the update. But I still do have the same comments as before.
          None of the issues I noted were addressed except the git url:

          • Please comment out the 3 new lines in bigtop.mk and change the git URL to the Apache git repo. By default Apache Bigtop should build components from their releases, and not their master/trunk branch
          • if $($(PKG)_GIT_LOC) is not defined, it should exit -1, not just exit. We should always signal errors
          • Why not using the "git archive" which already does all the archiving for you? If we can reduce the complexity and room for error of the code, we should do so.

          I would love to be able to check in this patch since it is almost done, but we need first to get these issues addressed.

          Show
          bmahe Bruno Mahé added a comment - - edited Thanks a lot for the update. But I still do have the same comments as before. None of the issues I noted were addressed except the git url: Please comment out the 3 new lines in bigtop.mk and change the git URL to the Apache git repo. By default Apache Bigtop should build components from their releases, and not their master/trunk branch if $($(PKG)_GIT_LOC) is not defined, it should exit -1, not just exit. We should always signal errors Why not using the "git archive" which already does all the archiving for you? If we can reduce the complexity and room for error of the code, we should do so. I would love to be able to check in this patch since it is almost done, but we need first to get these issues addressed.
          Hide
          redbrain Philip Herron added a comment -

          I have now fixed the first 2 points.

          The 3rd i am unsure on whether we should use git archive because bigtop is wierd. If you look in the bigtop.mk. We pull down the -src.tar.gz but then rename it to just .tar.gz and then the hadoop SPEC file expects the tarball and expects to cd into -src. So there is alot of sillyness going on there with src/dst.

          Show
          redbrain Philip Herron added a comment - I have now fixed the first 2 points. The 3rd i am unsure on whether we should use git archive because bigtop is wierd. If you look in the bigtop.mk. We pull down the -src.tar.gz but then rename it to just .tar.gz and then the hadoop SPEC file expects the tarball and expects to cd into -src. So there is alot of sillyness going on there with src/dst.
          Hide
          rvs Roman Shaposhnik added a comment -

          We would appreciate if you could submit patches to reduce the level of silliness. Some of it is, indeed, silly. Some of it is a function of needing to integrate well with the underlying packaging systems.

          Show
          rvs Roman Shaposhnik added a comment - We would appreciate if you could submit patches to reduce the level of silliness. Some of it is, indeed, silly. Some of it is a function of needing to integrate well with the underlying packaging systems.
          Hide
          rvs Roman Shaposhnik added a comment -
          +	if [ -z $($(PKG)_LOC) ]; then $($(PKG)_LOC)=NONE; fi
          +	if [ -z $($(PKG)_GIT_BRANCH) ]; then $($(PKG)_GIT_BRANCH)=master; fi
          

          I'd recommend replacing this with the native make(1) way of doing things
          (in package.mk):

          diff --git a/package.mk b/package.mk
          index 611eb5d..735c519 100644
          --- a/package.mk
          +++ b/package.mk
          @@ -153,6 +153,12 @@ $(2)_PKG_NAME       ?= $$($(2)_NAME)
           # The default PKG_RELEASE will be 1 unless specified
           $(2)_RELEASE        ?= 1
           
          +# The default remote location is NONE
          +$(2)_LOG            ?= NONE
          +
          +# The default branch is master
          +$(2)_GIT_BRANCH     ?= master
          +
           $(2)_BUILD_DIR      = $(BUILD_DIR)/$(1)/
           $(2)_OUTPUT_DIR      = $(OUTPUT_DIR)/$(1)
           $(2)_SOURCE_DIR       = $$($(2)_BUILD_DIR)/source
          
          Show
          rvs Roman Shaposhnik added a comment - + if [ -z $($(PKG)_LOC) ]; then $($(PKG)_LOC)=NONE; fi + if [ -z $($(PKG)_GIT_BRANCH) ]; then $($(PKG)_GIT_BRANCH)=master; fi I'd recommend replacing this with the native make(1) way of doing things (in package.mk): diff --git a/package.mk b/package.mk index 611eb5d..735c519 100644 --- a/package.mk +++ b/package.mk @@ -153,6 +153,12 @@ $(2)_PKG_NAME ?= $$($(2)_NAME) # The default PKG_RELEASE will be 1 unless specified $(2)_RELEASE ?= 1 +# The default remote location is NONE +$(2)_LOG ?= NONE + +# The default branch is master +$(2)_GIT_BRANCH ?= master + $(2)_BUILD_DIR = $(BUILD_DIR)/$(1)/ $(2)_OUTPUT_DIR = $(OUTPUT_DIR)/$(1) $(2)_SOURCE_DIR = $$($(2)_BUILD_DIR)/source
          Hide
          bmahe Bruno Mahé added a comment -

          The 3rd i am unsure on whether we should use git archive because bigtop is wierd. If you look in the bigtop.mk. We pull down the -src.tar.gz but then rename it to just .tar.gz and then the hadoop SPEC file expects the tarball and expects to cd into -src. So there is alot of sillyness going on there with src/dst.

          1. git archive enables you to pick the root directory within the archive
          2. git archive would save you 5 lines of bash easily
          3. If my memory is right, some packaging tools expect the standard convention <NAME><VERSION>, so we have to rename it. But then, the content of the package still has the <NAME><VERSION>-src root directory. As Roman pointed out, feel free to send a patch to improve this situation.
          Show
          bmahe Bruno Mahé added a comment - The 3rd i am unsure on whether we should use git archive because bigtop is wierd. If you look in the bigtop.mk. We pull down the -src.tar.gz but then rename it to just .tar.gz and then the hadoop SPEC file expects the tarball and expects to cd into -src. So there is alot of sillyness going on there with src/dst. git archive enables you to pick the root directory within the archive git archive would save you 5 lines of bash easily If my memory is right, some packaging tools expect the standard convention <NAME> <VERSION>, so we have to rename it. But then, the content of the package still has the <NAME> <VERSION>-src root directory. As Roman pointed out, feel free to send a patch to improve this situation.
          Hide
          apurtell Andrew Purtell added a comment -

          For a time I was building a subset of RPM packages from arbitrary GitHub SHAs. If done in the packaging scripts nothing on the Bigtop makefile side is necessary. For RPM, don't use the %setup macro and instead expand the source tarball and set perms in the specfile. (Or redefine the %setup macro.) Hive was a bit more tricky because the directory layout of a source checkout is different from a release tarball.

          Show
          apurtell Andrew Purtell added a comment - For a time I was building a subset of RPM packages from arbitrary GitHub SHAs. If done in the packaging scripts nothing on the Bigtop makefile side is necessary. For RPM, don't use the %setup macro and instead expand the source tarball and set perms in the specfile. (Or redefine the %setup macro.) Hive was a bit more tricky because the directory layout of a source checkout is different from a release tarball.
          Hide
          cos Konstantin Boudnik added a comment -

          Andrew Purtell, may be I am missing something, but I don't see how in the proposed .spec modification approach the tarballs are coming from? Could you please elaborate on that a little?

          Show
          cos Konstantin Boudnik added a comment - Andrew Purtell , may be I am missing something, but I don't see how in the proposed .spec modification approach the tarballs are coming from? Could you please elaborate on that a little?
          Hide
          apurtell Andrew Purtell added a comment -

          The patches just make it possible to work with a tarball that unpacks into an arbitrarily named top level directory. There is an existing mechanism in Bigtop to download tarballs given a SHA as src and a GitHub tarball download path as site, I used that.

          Show
          apurtell Andrew Purtell added a comment - The patches just make it possible to work with a tarball that unpacks into an arbitrarily named top level directory. There is an existing mechanism in Bigtop to download tarballs given a SHA as src and a GitHub tarball download path as site, I used that.
          Hide
          cos Konstantin Boudnik added a comment -

          ok, so then the question is: what's the better approach - fixing specs and debs files; or just make a single change in the make file?

          Show
          cos Konstantin Boudnik added a comment - ok, so then the question is: what's the better approach - fixing specs and debs files; or just make a single change in the make file?
          Hide
          apurtell Andrew Purtell added a comment -

          That doesn't seem like an either-or choice as long as the specs and debs hardcode tarball/directory names.

          Show
          apurtell Andrew Purtell added a comment - That doesn't seem like an either-or choice as long as the specs and debs hardcode tarball/directory names.
          Hide
          apurtell Andrew Purtell added a comment -

          I should say, the specs (via the %setup macro) assume a certain top level directory name. When using a GitHub SHA source you can't control this unless you would like to unpack the tarball, rename the top level directory, and repack it before starting the build. Of course, that is an option.

          Show
          apurtell Andrew Purtell added a comment - I should say, the specs (via the %setup macro) assume a certain top level directory name. When using a GitHub SHA source you can't control this unless you would like to unpack the tarball, rename the top level directory, and repack it before starting the build. Of course, that is an option.
          Hide
          redbrain Philip Herron added a comment -

          Hmm I don't think touching the spec files or anything is a good idea. They are fine and if bigtop handles all of this at the toplevel its the only way it makes sense. to do something like this. Because its how we get the tarball and makes it portable to debian packaging and rpm.

          I will go and fix the patch against Romans idea.

          Show
          redbrain Philip Herron added a comment - Hmm I don't think touching the spec files or anything is a good idea. They are fine and if bigtop handles all of this at the toplevel its the only way it makes sense. to do something like this. Because its how we get the tarball and makes it portable to debian packaging and rpm. I will go and fix the patch against Romans idea.
          Hide
          redbrain Philip Herron added a comment -

          Updated patch

          Show
          redbrain Philip Herron added a comment - Updated patch
          Hide
          bmahe Bruno Mahé added a comment -

          Thanks!
          I have not tried it yet, but it looks good to me.

          Show
          bmahe Bruno Mahé added a comment - Thanks! I have not tried it yet, but it looks good to me.
          Hide
          cos Konstantin Boudnik added a comment -

          Bruno Mahé, git archive is an appealing feature to use, however github nor ASF git mirror doesn' support git archive from outside it seems. Github though provides a custom URL to download an archive for a branch.

          So, the only alternative to Phil's approach - which might be a bit sub-optimal performance-wise because of the repo cloning - is to provide custom URLs that let you download an archive directly e.g. https://github.com/apache/hadoop-common/archive/branch-2.zip. However, this seems to be impracticle because different servers will have different URL patterns.

          Show
          cos Konstantin Boudnik added a comment - Bruno Mahé , git archive is an appealing feature to use, however github nor ASF git mirror doesn' support git archive from outside it seems. Github though provides a custom URL to download an archive for a branch. So, the only alternative to Phil's approach - which might be a bit sub-optimal performance-wise because of the repo cloning - is to provide custom URLs that let you download an archive directly e.g. https://github.com/apache/hadoop-common/archive/branch-2.zip . However, this seems to be impracticle because different servers will have different URL patterns.
          Hide
          cos Konstantin Boudnik added a comment - - edited

          Also, patch doesn't work in case of free-form component version, because

           cut -d'.' --complement -f4-
          

          doesn't leave much for imagination ;(

          Also, I see
          fatal: git checkout: updating paths is incompatible with switching branches.
          error message on the attempt to checkout the branch for the first time.

          Show
          cos Konstantin Boudnik added a comment - - edited Also, patch doesn't work in case of free-form component version, because cut -d'.' --complement -f4- doesn't leave much for imagination ;( Also, I see fatal: git checkout: updating paths is incompatible with switching branches. error message on the attempt to checkout the branch for the first time.
          Hide
          cos Konstantin Boudnik added a comment -

          This latest BIGTOP-848.patch seems to be working a bit better.

          Show
          cos Konstantin Boudnik added a comment - This latest BIGTOP-848 .patch seems to be working a bit better.
          Hide
          bmahe Bruno Mahé added a comment - - edited

          Konstantin Boudnik I believe we are talking about different things when referring to git archive. I was referring to the git archive command (see https://www.kernel.org/pub/software/scm/git/docs/git-archive.html ).
          Using the git archive command is a much better way to create a source tarball than to clone, checkout a branch/commit and then rm -rf .git.
          Note that I don't really have any issue with the way the url is passed, although if it was smarter I would not be against it.

          Note also that git archive enables you to specify any prefix, which can be useful since deb/rpm builds tools expect source archives to have a specific prefix.

          Show
          bmahe Bruno Mahé added a comment - - edited Konstantin Boudnik I believe we are talking about different things when referring to git archive. I was referring to the git archive command (see https://www.kernel.org/pub/software/scm/git/docs/git-archive.html ). Using the git archive command is a much better way to create a source tarball than to clone, checkout a branch/commit and then rm -rf .git . Note that I don't really have any issue with the way the url is passed, although if it was smarter I would not be against it. Note also that git archive enables you to specify any prefix, which can be useful since deb/rpm builds tools expect source archives to have a specific prefix.
          Hide
          cos Konstantin Boudnik added a comment -

          Bruno, we are talking about the same thing, actually. As I said, I couldn't make git archive to succeed against github repos. Remote server should have a special protocol support in order for this command to work.

          Show
          cos Konstantin Boudnik added a comment - Bruno, we are talking about the same thing, actually. As I said, I couldn't make git archive to succeed against github repos. Remote server should have a special protocol support in order for this command to work.
          Hide
          bmahe Bruno Mahé added a comment -

          But nothing prevents you from cloning and then archiving, doesn't it?

          Show
          bmahe Bruno Mahé added a comment - But nothing prevents you from cloning and then archiving, doesn't it?
          Hide
          cos Konstantin Boudnik added a comment -

          Nope, that approach should be cool. I was trying to optimize around it a little, because cloning something like Hadoop or Hive is an effort. I guess that's how we should go forward though.

          Show
          cos Konstantin Boudnik added a comment - Nope, that approach should be cool. I was trying to optimize around it a little, because cloning something like Hadoop or Hive is an effort. I guess that's how we should go forward though.
          Hide
          cos Konstantin Boudnik added a comment -

          I am not sure if we're still pursuing this approach? At any rate the patches are likely require an update. Canceling the patch for now.

          Show
          cos Konstantin Boudnik added a comment - I am not sure if we're still pursuing this approach? At any rate the patches are likely require an update. Canceling the patch for now.
          Hide
          cos Konstantin Boudnik added a comment -

          There was a feature added to allow the use of source code from github - it seems to be good enough for now.

          Show
          cos Konstantin Boudnik added a comment - There was a feature added to allow the use of source code from github - it seems to be good enough for now.

            People

            • Assignee:
              cos Konstantin Boudnik
              Reporter:
              cos Konstantin Boudnik
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development