Bigtop
  1. Bigtop
  2. BIGTOP-1080

Change /usr/bin scripts to be alternatives instead of flat files

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.7.0
    • Component/s: debian, general, rpm
    • Labels:
      None

      Description

      I think it would be a good idea to convert our /usr/bin scripts to be alternatives (i.e. symlinks) instead of flat files just like our configuration directories (/etc/component/config). It would make the package deployment more flexible for our users.

        Activity

        Hide
        Mark Grover added a comment -

        Patch uploaded, built mostly all components, tested a few, and gave it a through self review myself before uploading.

        Show
        Mark Grover added a comment - Patch uploaded, built mostly all components, tested a few, and gave it a through self review myself before uploading.
        Hide
        Mark Grover added a comment - - edited

        Review at https://reviews.apache.org/r/14222/

        Feedback welcome and much appreciated!

        Show
        Mark Grover added a comment - - edited Review at https://reviews.apache.org/r/14222/ Feedback welcome and much appreciated!
        Hide
        Bruno Mahé added a comment - - edited

        Thanks for putting together a patch, but I fail to see what problem this ticket solves.

        Why do you think this is a good idea and how is this helping users?
        Which problem is it solving? How often and in what context do you encounter such issue?
        What would be the use cases for our users?
        Configuration files need to use alternative since users need to be able to point to different (pseudo)?clusters. But we have nothing else to point our binaries to.

        So far, I don't think we should include this patch for the following reasons:

        • This adds a lot of complexity without any benefit for Apache Bigtop
        • There are already many levels of indirection (way too many actually) in our packages
        • I don't see what this additional level of indirection does that cannot be achieved through other means
        • If a user needs to use alternatives for binaries, he or she should not use packages in the first place but probably tarballs. Packages are made to be integrated with the system and therefore are not made to be taken apart. Packages should be simple and stupid so they just work. Even more so when one can just rename his scripts.
        Show
        Bruno Mahé added a comment - - edited Thanks for putting together a patch, but I fail to see what problem this ticket solves. Why do you think this is a good idea and how is this helping users? Which problem is it solving? How often and in what context do you encounter such issue? What would be the use cases for our users? Configuration files need to use alternative since users need to be able to point to different (pseudo)?clusters. But we have nothing else to point our binaries to. So far, I don't think we should include this patch for the following reasons: This adds a lot of complexity without any benefit for Apache Bigtop There are already many levels of indirection (way too many actually) in our packages I don't see what this additional level of indirection does that cannot be achieved through other means If a user needs to use alternatives for binaries, he or she should not use packages in the first place but probably tarballs. Packages are made to be integrated with the system and therefore are not made to be taken apart. Packages should be simple and stupid so they just work . Even more so when one can just rename his scripts.
        Hide
        Mark Grover added a comment -

        Hi Bruno!
        Thanks for reviewing this. Here is my intent:

        I think it would be nice to be able to run multiple versions of say, pig, against the same cluster. While this change doesn't achieve that for us on its own, it's a step in that direction. Alternatives provide a convenient way to switch over bin scripts.

        Also, I see Bigtop as a front runner to being a standard distribution of Hadoop and the ecosystem. With plans for Hadoop showing up in Fedora 20 (https://fedoraproject.org/wiki/Changes/Hadoop), I'd personally like Bigtop (and I am open to feedback) to be more less conflicting and more co-existent with other distributions. And, like I said before, while this change doesn't do that completely, it's a change in that direction, where the symlinks for bin scripts (and if followed by other distributions) would co-exist.

        Thanks again! Let me know what you think.

        Show
        Mark Grover added a comment - Hi Bruno! Thanks for reviewing this. Here is my intent: I think it would be nice to be able to run multiple versions of say, pig, against the same cluster. While this change doesn't achieve that for us on its own, it's a step in that direction. Alternatives provide a convenient way to switch over bin scripts. Also, I see Bigtop as a front runner to being a standard distribution of Hadoop and the ecosystem. With plans for Hadoop showing up in Fedora 20 ( https://fedoraproject.org/wiki/Changes/Hadoop ), I'd personally like Bigtop (and I am open to feedback) to be more less conflicting and more co-existent with other distributions. And, like I said before, while this change doesn't do that completely, it's a change in that direction, where the symlinks for bin scripts (and if followed by other distributions) would co-exist. Thanks again! Let me know what you think.
        Hide
        Sean Mackrory added a comment - - edited

        +1 on the intent, but -0 on the implementation. I'd love to see more discussion around a simple, unified solution to isolating Bigtop's artifacts from other distributions and deployment methods.

        I see where Mark's coming from and I've been concerned about the OS vendor distributions for some time. In fact, we're seeing some of that right now with name conflicts with the ideal names for the Avro and Spark packages. Package names are just one aspect, and I would bet that any other Hadoop distribution is also going to use /usr/bin/hadoop, /var/log/hadoop, etc. We'll need to do something like have package a Bigtop prefix for packages names (or make sure Bigtop repositories have higher priorities than base OS repositories), and enforce package conflicts for other distributions, or hope that the OS vendors also use alternatives in places where we have not yet.

        I think this is a much bigger discussion and this patch is only a partial solution, so for now I think we should err on the side of simplicity until we can figure out a longer-term, bigger-picture strategy.

        Show
        Sean Mackrory added a comment - - edited +1 on the intent, but -0 on the implementation. I'd love to see more discussion around a simple, unified solution to isolating Bigtop's artifacts from other distributions and deployment methods. I see where Mark's coming from and I've been concerned about the OS vendor distributions for some time. In fact, we're seeing some of that right now with name conflicts with the ideal names for the Avro and Spark packages. Package names are just one aspect, and I would bet that any other Hadoop distribution is also going to use /usr/bin/hadoop, /var/log/hadoop, etc. We'll need to do something like have package a Bigtop prefix for packages names (or make sure Bigtop repositories have higher priorities than base OS repositories), and enforce package conflicts for other distributions, or hope that the OS vendors also use alternatives in places where we have not yet. I think this is a much bigger discussion and this patch is only a partial solution, so for now I think we should err on the side of simplicity until we can figure out a longer-term, bigger-picture strategy.
        Hide
        Bruno Mahé added a comment -

        I think it would be nice to be able to run multiple versions of say, pig, against the same cluster. While this change doesn't achieve that for us on its own, it's a step in that direction. Alternatives provide a convenient way to switch over bin scripts.

        We do not package more than one version of pig. So in an Apache Bigtop context, I do not see the issue. In a non-Apache Bigtop context, see further down.
        Also the usual way to deal with multiple versions of a package is to suffix the scripts/packages with their versions. See packages python vs python26 on centos 5. Or see also packages for Apache Flume 1 and 2 in Apache Bigtop a few month ago.
        I would lean toward having versions appended to the scripts since it would make the version being used more obvious. Also if we need to package more than one version, it means they are not compatible/interchangeable since each version fulfill a different need. Therefore having multiple versions behind the same alternative would only add confusion and complexity.

        I'd personally like Bigtop (and I am open to feedback) to be more less conflicting and more co-existent with other distributions.

        So do I. But Apache Bigtop taking unilateral actions will not foster consensus. Other distributions have other requirements and needs. And this patch may or may not accommodate these needs and requirements.
        We should get people from other distributions involved in the first place so we can all agree on a solution and implement it. I can volunteer to ping some of the fedora folks to help get this started and see if they are interested.
        While I appreciate the intent of this ticket, I do not agree with this patch (unless this idea gets sold to most distributions or I am missing something). This patch may still have the very same issues it is trying to fix. For instance what if other distributions decide to not honor the alternatives set up for the bin scripts? We would end up with the same conflicting paths and additional complexity.

        this change doesn't do that completely, it's a change in that direction, where the symlinks for bin scripts (and if followed by other distributions) would co-exist.

        I would also encourage you to write down your plan on the wiki or mailing-list. This would enable the community to review your plan and give some feedback before you spend any time working on it. And you may even get volunteers for some of the work.
        It would also provide some very useful context to your patches.

        So for now I would rather not commit this patch and wait to see how the community wants to deal with multiple distributions of some of our projects.

        Show
        Bruno Mahé added a comment - I think it would be nice to be able to run multiple versions of say, pig, against the same cluster. While this change doesn't achieve that for us on its own, it's a step in that direction. Alternatives provide a convenient way to switch over bin scripts. We do not package more than one version of pig. So in an Apache Bigtop context, I do not see the issue. In a non-Apache Bigtop context, see further down. Also the usual way to deal with multiple versions of a package is to suffix the scripts/packages with their versions. See packages python vs python26 on centos 5. Or see also packages for Apache Flume 1 and 2 in Apache Bigtop a few month ago. I would lean toward having versions appended to the scripts since it would make the version being used more obvious. Also if we need to package more than one version, it means they are not compatible/interchangeable since each version fulfill a different need. Therefore having multiple versions behind the same alternative would only add confusion and complexity. I'd personally like Bigtop (and I am open to feedback) to be more less conflicting and more co-existent with other distributions. So do I. But Apache Bigtop taking unilateral actions will not foster consensus. Other distributions have other requirements and needs. And this patch may or may not accommodate these needs and requirements. We should get people from other distributions involved in the first place so we can all agree on a solution and implement it. I can volunteer to ping some of the fedora folks to help get this started and see if they are interested. While I appreciate the intent of this ticket, I do not agree with this patch (unless this idea gets sold to most distributions or I am missing something). This patch may still have the very same issues it is trying to fix. For instance what if other distributions decide to not honor the alternatives set up for the bin scripts? We would end up with the same conflicting paths and additional complexity. this change doesn't do that completely, it's a change in that direction, where the symlinks for bin scripts (and if followed by other distributions) would co-exist. I would also encourage you to write down your plan on the wiki or mailing-list. This would enable the community to review your plan and give some feedback before you spend any time working on it. And you may even get volunteers for some of the work. It would also provide some very useful context to your patches. So for now I would rather not commit this patch and wait to see how the community wants to deal with multiple distributions of some of our projects.
        Hide
        Mark Grover added a comment -

        Thanks all for your input, Bruno and Sean! One thing that we all agree on is that this JIRA is not the right platform for discussing and solving the larger problem at hand. So, I am going to mark this as Won't Fix and use a wiki page or mailing list to discuss further. Thanks again!

        Show
        Mark Grover added a comment - Thanks all for your input, Bruno and Sean! One thing that we all agree on is that this JIRA is not the right platform for discussing and solving the larger problem at hand. So, I am going to mark this as Won't Fix and use a wiki page or mailing list to discuss further. Thanks again!
        Hide
        Roman Shaposhnik added a comment -

        So do I. But Apache Bigtop taking unilateral actions will not foster consensus. Other distributions have other requirements and needs. And this patch may or may not accommodate these needs and requirements. We should get people from other distributions involved in the first place so we can all agree on a solution and implement it. I can volunteer to ping some of the fedora folks to help get this started and see if they are interested.

        Very well said! I think we can no longer pretend that we're the only ones delivering Hadoop ecosystem bits onto the system.

        Show
        Roman Shaposhnik added a comment - So do I. But Apache Bigtop taking unilateral actions will not foster consensus. Other distributions have other requirements and needs. And this patch may or may not accommodate these needs and requirements. We should get people from other distributions involved in the first place so we can all agree on a solution and implement it. I can volunteer to ping some of the fedora folks to help get this started and see if they are interested. Very well said! I think we can no longer pretend that we're the only ones delivering Hadoop ecosystem bits onto the system.

          People

          • Assignee:
            Mark Grover
            Reporter:
            Mark Grover
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development