Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Not a Problem
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Currently flume (part of BigTop) depends on CDH components (thrift, zk, hadoop-core). How does Bigtop deal with this?

        Issue Links

          Activity

          Hide
          Arun C Murthy added a comment -

          Thanks Roman!

          Show
          Arun C Murthy added a comment - Thanks Roman!
          Hide
          Roman Shaposhnik added a comment -

          @Arun,

          1. Correct. To be even more explicit where Bigtop is coming from is – building a Bigdata management distribution/stack. To that end we have a hard goal of harmonizing all dependencies between components in the distribution. IOW, if component A happens to be part of the distribution all other components depending on A have to depend on exact version of A that is part of Bigtop and nothing else (that is why we try very hard to make sure that there's a single copy of an A artifact(s) and all other components reference it in some way). Hadoop is a good example of a component that is part of the distribution with almost everything else depending on it (hence we use techniques ranging from symbloic linking from lib to the actual Hadoop jars all the way to constructing a classpath). We also have a soft goal of harmonizing dependencies that do NOT come from the distribution itslef. IOW, if Bigtop components depend on a non-Bigtop component foo, we would like to make sure that the version of foo is common, but we don't really make sure that there just one copy of foo. log4j artifacts are a good example.
          1. At this point we draw a very hard line at ASL. If a piece of software is not covered by ASL, currently Bigtop will not be able to incorporate it. We try to focus on ASF projects, but if a really interesting piece of software comes along which is covered by ASL, but is not (yet?) part of the ASF – we might consider including it. Such was the case with Oozie, for example.
          Show
          Roman Shaposhnik added a comment - @Arun, Correct. To be even more explicit where Bigtop is coming from is – building a Bigdata management distribution/stack. To that end we have a hard goal of harmonizing all dependencies between components in the distribution. IOW, if component A happens to be part of the distribution all other components depending on A have to depend on exact version of A that is part of Bigtop and nothing else (that is why we try very hard to make sure that there's a single copy of an A artifact(s) and all other components reference it in some way). Hadoop is a good example of a component that is part of the distribution with almost everything else depending on it (hence we use techniques ranging from symbloic linking from lib to the actual Hadoop jars all the way to constructing a classpath). We also have a soft goal of harmonizing dependencies that do NOT come from the distribution itslef. IOW, if Bigtop components depend on a non-Bigtop component foo, we would like to make sure that the version of foo is common, but we don't really make sure that there just one copy of foo. log4j artifacts are a good example. At this point we draw a very hard line at ASL. If a piece of software is not covered by ASL, currently Bigtop will not be able to incorporate it. We try to focus on ASF projects, but if a really interesting piece of software comes along which is covered by ASL, but is not (yet?) part of the ASF – we might consider including it. Such was the case with Oozie, for example.
          Hide
          Arun C Murthy added a comment -

          Bruno, many thanks for your clarifications.


          A couple points which would do away the last vestiges of my confusion:

          Drawing the distinction between projects packaged & integrated by Apache Bigtop (Hadoop, HBase, Pig, Hive, Oozie, Flume, etc.) and ones which are merely transitive dependencies (protocolbuffers, clover, guava etc.), I'm coming away understanding that:

          1. Apache Bigtop maybe choose to manipulate the upstream projects dependencies (e.g. with flume-0.9.3) iff necessary. As you say, you don't patch the upstream project tarball, but you may take the source and possibly manipulate the build and/or runtime dependencies as was done with flume-0.9.3 for bigtop-0.2.0. Correct?
          2. Apache Bigtop may choose to include ASF-compliant projects. What does that mean? Are you merely talking about the ASL? Or, do you refer to actual projects developed under the aegis of the ASF? If it's the former, including flume-0.9.3 (again, under the distinction I drew) in bigtop-0.2.0 was maybe just a one-off?

          Thanks again for helping me understand Apache Bigtop, appreciate it.

          Show
          Arun C Murthy added a comment - Bruno, many thanks for your clarifications. A couple points which would do away the last vestiges of my confusion: Drawing the distinction between projects packaged & integrated by Apache Bigtop (Hadoop, HBase, Pig, Hive, Oozie, Flume, etc.) and ones which are merely transitive dependencies (protocolbuffers, clover, guava etc.), I'm coming away understanding that: Apache Bigtop maybe choose to manipulate the upstream projects dependencies (e.g. with flume-0.9.3) iff necessary. As you say, you don't patch the upstream project tarball, but you may take the source and possibly manipulate the build and/or runtime dependencies as was done with flume-0.9.3 for bigtop-0.2.0. Correct? Apache Bigtop may choose to include ASF-compliant projects. What does that mean? Are you merely talking about the ASL? Or, do you refer to actual projects developed under the aegis of the ASF? If it's the former, including flume-0.9.3 (again, under the distinction I drew) in bigtop-0.2.0 was maybe just a one-off? Thanks again for helping me understand Apache Bigtop, appreciate it.
          Hide
          Bruno Mahé added a comment -

          1. I don't recall anything about it. I usually make sure all dependencies are Apache compliant and the one we package have the right inter-dependencies set up (ie. they don't ship their own hadoop jars). Beyond that, the dependencies a project pick are the responsibility of that project.

          2. I usually make sure each project does not ship its own zookeeper jar. So I expect to find a symlink to zookeeper jars in that flume package. If not, please open a ticket.

          3. If I understand correctly your question, this is something we can help projects with (helping projects shipping with ASF only projects and dependencies of the same version), but this is outside of our control. As a matter of policy, we don't patch any upstream tarball. So we can't override dependencies if projects don't provide such feature. Also as an example, I see Apache Hadoop pulling jars such as clover, guava, guice, hsqldb and protocolbuffer. None of these dependencies are under the ASF, and Apache Hadoop would unlikely work if we strip its resulting build from any non-ASF jar. From my point of view, this is a non-issue. The goal of Apache Bigtop as I see it is to provide a point of integration for all ASF compliant projects related to Apache Hadoop. So I would not have any issue providing packaging, tests and deployment recipes for ASF-compliant projects. But I do not represent the community.

          4. We don't patch anything but we can still provide alternative implementations. In this case, it was done for packaging/practical reasons more than going around CDH hadoop.

          Are you trying to use flume-0.9.3? Why not using Apache Flume (incubating) 0.1.0 instead?

          Show
          Bruno Mahé added a comment - 1. I don't recall anything about it. I usually make sure all dependencies are Apache compliant and the one we package have the right inter-dependencies set up (ie. they don't ship their own hadoop jars). Beyond that, the dependencies a project pick are the responsibility of that project. 2. I usually make sure each project does not ship its own zookeeper jar. So I expect to find a symlink to zookeeper jars in that flume package. If not, please open a ticket. 3. If I understand correctly your question, this is something we can help projects with (helping projects shipping with ASF only projects and dependencies of the same version), but this is outside of our control. As a matter of policy, we don't patch any upstream tarball. So we can't override dependencies if projects don't provide such feature. Also as an example, I see Apache Hadoop pulling jars such as clover, guava, guice, hsqldb and protocolbuffer. None of these dependencies are under the ASF, and Apache Hadoop would unlikely work if we strip its resulting build from any non-ASF jar. From my point of view, this is a non-issue. The goal of Apache Bigtop as I see it is to provide a point of integration for all ASF compliant projects related to Apache Hadoop. So I would not have any issue providing packaging, tests and deployment recipes for ASF-compliant projects. But I do not represent the community. 4. We don't patch anything but we can still provide alternative implementations. In this case, it was done for packaging/practical reasons more than going around CDH hadoop. Are you trying to use flume-0.9.3? Why not using Apache Flume (incubating) 0.1.0 instead?
          Hide
          Arun C Murthy added a comment -

          Bruno, that's exactly what I was looking for. Thanks!

          If you don't mind, I'd like a couple more clarifications while I have your attention:

          1. flume-0.9.3 also seems to depend on custom version of thrift from CDH (again FLUME-959). Can you please clarify bigtop-0.2.0, likewise, works around flume-0.9.3 so that it doesn't have that dependency either?
          2. Likewise for ZooKeeper? (again FLUME-959)

          Couple more questions, more along what we should expect from bigtop:

          1. As you pointed out flume-0.9.3 has never been released via the ASF Incubator, yet bigtop-0.2.0 includes it. Is this a one-off i.e. in future, can we expect bigtop-0.2.0 to not include non-ASF released software?
          2. AFAIK, bigtop aims to include only released artifacts from ASF. In this case, it seems that bigtop worked around flume deps on CDH, is this a one-off? Or, is it reasonable for bigtop to change the actual artifacts from the individual projects?

          Thanks again!

          Show
          Arun C Murthy added a comment - Bruno, that's exactly what I was looking for. Thanks! If you don't mind, I'd like a couple more clarifications while I have your attention: flume-0.9.3 also seems to depend on custom version of thrift from CDH (again FLUME-959 ). Can you please clarify bigtop-0.2.0, likewise, works around flume-0.9.3 so that it doesn't have that dependency either? Likewise for ZooKeeper? (again FLUME-959 ) Couple more questions, more along what we should expect from bigtop: As you pointed out flume-0.9.3 has never been released via the ASF Incubator, yet bigtop-0.2.0 includes it. Is this a one-off i.e. in future, can we expect bigtop-0.2.0 to not include non-ASF released software? AFAIK, bigtop aims to include only released artifacts from ASF. In this case, it seems that bigtop worked around flume deps on CDH, is this a one-off? Or, is it reasonable for bigtop to change the actual artifacts from the individual projects? Thanks again!
          Hide
          Bruno Mahé added a comment - - edited
          [bruno@p8700 Downloads]$ wget http://www.apache.org/dist/incubator/bigtop/stable/repos/centos5/flume/flume-0.9.3.2-1.noarch.rpm
          [bruno@p8700 Downloads]$ less flume-0.9.3.2-1.noarch.rpm |grep hadoop
          lrwxr-xr-x    1 root    root                       31 Nov  2 20:52 /usr/lib/flume/lib/hadoop-core.jar -> /usr/lib/hadoop/hadoop-core.jar
          

          So even if flume 0.9.3 depends on CDH Hadoop, Apache bigtop (incubating) was enforcing the use of Apache Hadoop.
          Keep also in mind that flume 0.9.3 is a release of flume from before its incubation.

          Apache Flume (incubating) 0.1.0 is the very first and only (so far) release of Apache Flume (incubating), which also does depend on Apache Hadoop only.

          Show
          Bruno Mahé added a comment - - edited [bruno@p8700 Downloads]$ wget http://www.apache.org/dist/incubator/bigtop/stable/repos/centos5/flume/flume-0.9.3.2-1.noarch.rpm [bruno@p8700 Downloads]$ less flume-0.9.3.2-1.noarch.rpm |grep hadoop lrwxr-xr-x 1 root root 31 Nov 2 20:52 /usr/lib/flume/lib/hadoop-core.jar -> /usr/lib/hadoop/hadoop-core.jar So even if flume 0.9.3 depends on CDH Hadoop, Apache bigtop (incubating) was enforcing the use of Apache Hadoop. Keep also in mind that flume 0.9.3 is a release of flume from before its incubation. Apache Flume (incubating) 0.1.0 is the very first and only (so far) release of Apache Flume (incubating), which also does depend on Apache Hadoop only.
          Hide
          Arun C Murthy added a comment -

          Roman, thanks for the clarification.

          However, I'm still a little confused, so, please bear with me. Thanks in advance.


          From the bigtop-0.2.0 announcement (http://s.apache.org/bigtop-0.2.0) I landed up at http://www.apache.org/dist/incubator/bigtop/stable/.

          From there, I looked at http://www.apache.org/dist/incubator/bigtop/stable/repos/centos5/flume/ which lists flume-0.9.3.

          Now, looking at flume-0.9.3, AFAICS you hit FLUME-959 i.e. flume-0.9.* seems to depend on CDH3.

          As a result, bigtop-0.2.0, via flume-0.9.3 has a dependency on CDH (that exists on flume trunk too, not sure which branch is used for developing flume-ng).


          So, my question is - how does bigtop-0.2.0 deal with that transitive dependency on CDH?

          Am I missing something? Can you please clarify? Thanks.

          Show
          Arun C Murthy added a comment - Roman, thanks for the clarification. However, I'm still a little confused, so, please bear with me. Thanks in advance. From the bigtop-0.2.0 announcement ( http://s.apache.org/bigtop-0.2.0 ) I landed up at http://www.apache.org/dist/incubator/bigtop/stable/ . From there, I looked at http://www.apache.org/dist/incubator/bigtop/stable/repos/centos5/flume/ which lists flume-0.9.3. Now, looking at flume-0.9.3, AFAICS you hit FLUME-959 i.e. flume-0.9.* seems to depend on CDH3. As a result, bigtop-0.2.0, via flume-0.9.3 has a dependency on CDH (that exists on flume trunk too, not sure which branch is used for developing flume-ng). So, my question is - how does bigtop-0.2.0 deal with that transitive dependency on CDH? Am I missing something? Can you please clarify? Thanks.
          Hide
          Roman Shaposhnik added a comment -

          @Arun, to reiterate – Bigtop does NOT depend on any artifacts that are coming out of CDH. In fact, releases of Bigtop never depend on source code coming out of SVN. Releases of Bigtop strive to package officially released artifacts of Apache software (this is called '0 patching policy'). I'm not sure what problem you're trying to address with FLUME-959, but if you look into the source packages from Bigtop's trunk:
          http://bigtop01.cloudera.org:8080/view/Bigtop-trunk/job/Bigtop-trunk-Flume/label=centos5/lastSuccessfulBuild/artifact/output/flume/flume-1.0.0.8-1.src.rpm
          and grep there (make sure to untar the source tarball first) for CDH (you can do grep -i to catch all types of spelling) you'll see that there are no matches.

          At this point, I'm not sure what else can be done on this JIRA, so please let us know if we can close it.

          Of course, if you happen to find any references to CDH – please do bring those to our attention.

          Thanks,
          Roman.

          P.S. Speaking of Bigtop picking up tarballs – any chance to rename 0.23.1 to NOT have -src postfix in the source tarball?

          Show
          Roman Shaposhnik added a comment - @Arun, to reiterate – Bigtop does NOT depend on any artifacts that are coming out of CDH. In fact, releases of Bigtop never depend on source code coming out of SVN. Releases of Bigtop strive to package officially released artifacts of Apache software (this is called '0 patching policy'). I'm not sure what problem you're trying to address with FLUME-959 , but if you look into the source packages from Bigtop's trunk: http://bigtop01.cloudera.org:8080/view/Bigtop-trunk/job/Bigtop-trunk-Flume/label=centos5/lastSuccessfulBuild/artifact/output/flume/flume-1.0.0.8-1.src.rpm and grep there (make sure to untar the source tarball first) for CDH (you can do grep -i to catch all types of spelling) you'll see that there are no matches. At this point, I'm not sure what else can be done on this JIRA, so please let us know if we can close it. Of course, if you happen to find any references to CDH – please do bring those to our attention. Thanks, Roman. P.S. Speaking of Bigtop picking up tarballs – any chance to rename 0.23.1 to NOT have -src postfix in the source tarball?
          Hide
          Arun C Murthy added a comment -

          Also, to clarify, Bigtop 0.2.0 has a dep on CDH - correct? Or, am I missing something? Thanks.

          Show
          Arun C Murthy added a comment - Also, to clarify, Bigtop 0.2.0 has a dep on CDH - correct? Or, am I missing something? Thanks.
          Hide
          Arun C Murthy added a comment -

          Thanks Roman. I still see the deps (FLUME-959) in flume trunk. Is bigtop-0.3.0 blocked on FLUME-959 then?

          Show
          Arun C Murthy added a comment - Thanks Roman. I still see the deps ( FLUME-959 ) in flume trunk. Is bigtop-0.3.0 blocked on FLUME-959 then?
          Hide
          Roman Shaposhnik added a comment -

          @Arun,

          the upcoming Bigtop 0.3.0 is going to include Flume-NG (AKA the first Apache release of Flume). That change is currently in Bigtop's trunk and the nightly packaging builds are available from here:
          http://bigtop01.cloudera.org:8080/view/Bigtop-trunk/job/Bigtop-trunk-Repository/

          Please let me know if you still see this problem with Flume-NG

          Show
          Roman Shaposhnik added a comment - @Arun, the upcoming Bigtop 0.3.0 is going to include Flume-NG (AKA the first Apache release of Flume). That change is currently in Bigtop's trunk and the nightly packaging builds are available from here: http://bigtop01.cloudera.org:8080/view/Bigtop-trunk/job/Bigtop-trunk-Repository/ Please let me know if you still see this problem with Flume-NG
          Hide
          Arun C Murthy added a comment -

          Linking FLUME-959.

          Show
          Arun C Murthy added a comment - Linking FLUME-959 .

            People

            • Assignee:
              Bruno Mahé
              Reporter:
              Arun C Murthy
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development