Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.0
    • Component/s: Documentation
    • Labels:
      None

      Description

      I think some updates are in order for this Wiki page in order to more accurately reflect the goals of the project: https://cwiki.apache.org/confluence/display/BIGTOP/Requirement+for+adding+a+new+component+to+Bigtop+distribution.

      For starters, it should have a more prominent link on the Wiki and a summary on the project home page. Not only is it going to be an increasingly common question, but it's an important part of defining what "Bigtop" is really all about. I know when I first started on the project it was the first thing I looked for.

      Second, we say projects have to be "Big Data"-related. Judging by other documents* and existing discussions regarding new components, it seems like we really mean to say "Hadoop"-related, and I think the extra focus would be good (having said that, I do think of Spark as Hadoop-related).

        Activity

        Hide
        Konstantin Boudnik added a comment -

        +1 on better navigation around the Wiki - always a good idea.

        As for the better understanding of the goals of the project, it seems that they are clearly stated in the top page of the Wiki:

        Apache Bigtop has several purposes.
        
            Packaging
            Deployment
            Integration Testing
        

        I don't see how this can cause any confusion.

        Judging by other documents* and existing discussions regarding new components, it seems like we really mean to say "Hadoop"-related

        Hmm, if it says "BigData related" then it is the meaning of the sentence in plain English. Besides, allowing such a modification would effectively prevent us from doing stacks for say Lucine, or Drill, or something else. Do we want such a limitation in place? I really don't anyone would like to shoot himself in a foot before embarking on a marathon competition.

        I would like to add that this set of goals have served us well through the incubation and graduation processes. Now it might appeal to some that this is a good time to update the goals or somehow modify them. In this case, our project has to have a clear set of bylaws defining, among other things, set of voting rules, and so on and so far. So, let's deal with that first and foremost.

        Show
        Konstantin Boudnik added a comment - +1 on better navigation around the Wiki - always a good idea. As for the better understanding of the goals of the project, it seems that they are clearly stated in the top page of the Wiki : Apache Bigtop has several purposes. Packaging Deployment Integration Testing I don't see how this can cause any confusion. Judging by other documents* and existing discussions regarding new components, it seems like we really mean to say "Hadoop"-related Hmm, if it says "BigData related" then it is the meaning of the sentence in plain English. Besides, allowing such a modification would effectively prevent us from doing stacks for say Lucine, or Drill, or something else. Do we want such a limitation in place? I really don't anyone would like to shoot himself in a foot before embarking on a marathon competition. I would like to add that this set of goals have served us well through the incubation and graduation processes. Now it might appeal to some that this is a good time to update the goals or somehow modify them. In this case, our project has to have a clear set of bylaws defining, among other things, set of voting rules, and so on and so far. So, let's deal with that first and foremost.
        Hide
        Eli Collins added a comment -

        Thanks for filing Sean. Suggestions:

        1. A component be part of the Apache Hadoop ecosystem (per your comment about "Big Data" we need to define that better but I'd start with integrates with and/or is powered by Apache Hadoop).
        2. A requirement that all components are added to trunk before release branches. We articulate something similar in Hadoop (see http://wiki.apache.org/hadoop/Roadmap), in general it's good to be trunk first so different Bigtop releases don't have inconsistent components
        3. A requirement that new components work with all of the existing components before they are included in a release. New additions can bake in trunk or a feature branch but we shouldn't release them until they work with the rest of the stack (for the other parts of the stack they interact with).
        4. I'd clarify that the projects dependencies need to be ASFv2 compatible (which is implied but better to be clear)
        5. I'd consider making integration testing a hard requirement

        I agree that we should put this info prominently on the project homepage (version controlled text, parts of the generated docs). How about removing it from the wiki at the same time so we don't maintain two copies?

        I'd also update the docs to be explicit that Bigtop is about the integration of the Apache Hadoop ecosystem, not just the package and test development (which are parts of the overall integration effort).

        Show
        Eli Collins added a comment - Thanks for filing Sean. Suggestions: A component be part of the Apache Hadoop ecosystem (per your comment about "Big Data" we need to define that better but I'd start with integrates with and/or is powered by Apache Hadoop). A requirement that all components are added to trunk before release branches. We articulate something similar in Hadoop (see http://wiki.apache.org/hadoop/Roadmap ), in general it's good to be trunk first so different Bigtop releases don't have inconsistent components A requirement that new components work with all of the existing components before they are included in a release. New additions can bake in trunk or a feature branch but we shouldn't release them until they work with the rest of the stack (for the other parts of the stack they interact with). I'd clarify that the projects dependencies need to be ASFv2 compatible (which is implied but better to be clear) I'd consider making integration testing a hard requirement I agree that we should put this info prominently on the project homepage (version controlled text, parts of the generated docs). How about removing it from the wiki at the same time so we don't maintain two copies? I'd also update the docs to be explicit that Bigtop is about the integration of the Apache Hadoop ecosystem, not just the package and test development (which are parts of the overall integration effort).
        Hide
        Bruno Mahé added a comment -

        Any change to any policy should be voted upon by the community.

        For starters, it should have a more prominent link on the Wiki and a summary on the project home page. Not only is it going to be an increasingly common question, but it's an important part of defining what "Bigtop" is really all about. I know when I first started on the project it was the first thing I looked for.

        Good call! Anything making the documentation more obvious and easy to find is a good thing.

        Second, we say projects have to be "Big Data"-related. Judging by other documents* and existing discussions regarding new components, it seems like we really mean to say "Hadoop"-related, and I think the extra focus would be good (having said that, I do think of Spark as Hadoop-related).

        Imho, Apache "Hadoop"-related is still too narrow. What if there is a great tool that significantly enhance Apache Flume or Apache Oozie without having any Apache "Hadoop"-related feature? We would be missing on some great opportunity.
        "Big Data"-related may not be as narrow as Apache "Hadoop"-related but I don't see where is the issue. If the community wishes to provide some other great tool not necessary related to Apache Hadoop but useful nonetheless, why preventing it?

        Show
        Bruno Mahé added a comment - Any change to any policy should be voted upon by the community. For starters, it should have a more prominent link on the Wiki and a summary on the project home page. Not only is it going to be an increasingly common question, but it's an important part of defining what "Bigtop" is really all about. I know when I first started on the project it was the first thing I looked for. Good call! Anything making the documentation more obvious and easy to find is a good thing. Second, we say projects have to be "Big Data"-related. Judging by other documents* and existing discussions regarding new components, it seems like we really mean to say "Hadoop"-related, and I think the extra focus would be good (having said that, I do think of Spark as Hadoop-related). Imho, Apache "Hadoop"-related is still too narrow. What if there is a great tool that significantly enhance Apache Flume or Apache Oozie without having any Apache "Hadoop"-related feature? We would be missing on some great opportunity. "Big Data"-related may not be as narrow as Apache "Hadoop"-related but I don't see where is the issue. If the community wishes to provide some other great tool not necessary related to Apache Hadoop but useful nonetheless, why preventing it?
        Hide
        Tom White added a comment -

        The board resolution for graduation that was passed at the last board meeting states that the scope of Bigtop covers "integration, packaging, deployment and validation of a big data management software distribution based on Apache Hadoop". That phrase should be used in the Bigtop documentation.

        Show
        Tom White added a comment - The board resolution for graduation that was passed at the last board meeting states that the scope of Bigtop covers "integration, packaging, deployment and validation of a big data management software distribution based on Apache Hadoop". That phrase should be used in the Bigtop documentation.
        Hide
        Konstantin Boudnik added a comment -

        Good point, Tom. Does it mean that we won't be able (as in prohibited from) to build Drill stack later in the game?

        Show
        Konstantin Boudnik added a comment - Good point, Tom. Does it mean that we won't be able (as in prohibited from) to build Drill stack later in the game?
        Hide
        Arun Singla added a comment -

        I would like to add that this set of goals have served us well through the incubation and graduation processes.

        That is precisely the reason why the goals should be crisply added to the project homepage and documentation. So far, there were only a few people contributing to the project and now there would be many more. The goals were clear to these early contributors but it would start causing confusion moving forward.

        Our project has to have a clear set of bylaws defining, among other things, set of voting rules, and so on and so far. So, let's deal with that first and foremost.

        +1 on the need to have clarity on the bylaws. But that doesn't seem to suggest that we shouldn't improve wiki navigation and fix general documentation.

        1. A component be part of the Apache Hadoop ecosystem (per your comment about "Big Data" we need to define that better but I'd start with integrates with and/or is powered by Apache Hadoop).

        +1

        2. A requirement that all components are added to trunk before release branches. We articulate something similar in Hadoop (see http://wiki.apache.org/hadoop/Roadmap), in general it's good to be trunk first so different Bigtop releases don't have inconsistent components

        +1, The community already has several commercially supported Hadoop distros - that are meant to be stable and have inconsistent set of components for their respective corporate reasons. These are already causing plenty of confusion in a user's mind. Bigtop distro doesn't necessarily need to be a production ready distro and doesn't need to have inconsistent branches. Bigtop is first and foremost serving the purpose of being a bleeding edge collection of hadoop components, and so it needs to grow up in one direction. With that thought a Trunk First Model sounds fundamental to Bigtop, IMO.

        3. A requirement that new components work with all of the existing components before they are included in a release. New additions can bake in trunk or a feature branch but we shouldn't release them until they work with the rest of the stack (for the other parts of the stack they interact with).

        4. I'd clarify that the projects dependencies need to be ASFv2 compatible (which is implied but better to be clear)

        +1, It is better to be clear and even more so now

        5. I'd consider making integration testing a hard requirement

        +1, Without integration testing it would be become a mess

        Show
        Arun Singla added a comment - I would like to add that this set of goals have served us well through the incubation and graduation processes. That is precisely the reason why the goals should be crisply added to the project homepage and documentation. So far, there were only a few people contributing to the project and now there would be many more. The goals were clear to these early contributors but it would start causing confusion moving forward. Our project has to have a clear set of bylaws defining, among other things, set of voting rules, and so on and so far. So, let's deal with that first and foremost. +1 on the need to have clarity on the bylaws. But that doesn't seem to suggest that we shouldn't improve wiki navigation and fix general documentation. 1. A component be part of the Apache Hadoop ecosystem (per your comment about "Big Data" we need to define that better but I'd start with integrates with and/or is powered by Apache Hadoop). +1 2. A requirement that all components are added to trunk before release branches. We articulate something similar in Hadoop (see http://wiki.apache.org/hadoop/Roadmap ), in general it's good to be trunk first so different Bigtop releases don't have inconsistent components +1, The community already has several commercially supported Hadoop distros - that are meant to be stable and have inconsistent set of components for their respective corporate reasons. These are already causing plenty of confusion in a user's mind. Bigtop distro doesn't necessarily need to be a production ready distro and doesn't need to have inconsistent branches. Bigtop is first and foremost serving the purpose of being a bleeding edge collection of hadoop components, and so it needs to grow up in one direction. With that thought a Trunk First Model sounds fundamental to Bigtop, IMO. 3. A requirement that new components work with all of the existing components before they are included in a release. New additions can bake in trunk or a feature branch but we shouldn't release them until they work with the rest of the stack (for the other parts of the stack they interact with). 4. I'd clarify that the projects dependencies need to be ASFv2 compatible (which is implied but better to be clear) +1, It is better to be clear and even more so now 5. I'd consider making integration testing a hard requirement +1, Without integration testing it would be become a mess
        Hide
        Konstantin Boudnik added a comment - - edited

        I still don't see the questions about what do you guys see as a legit components being answered. E.g.

        would effectively prevent us from doing stacks for say Lucine, or Drill, or something else

        and

        If the community wishes to provide some other great tool not necessary related to Apache Hadoop but useful nonetheless, why preventing it?

        Unless these are satisfied, any changes in the goals and charter of the project can't be put in motion.

        Show
        Konstantin Boudnik added a comment - - edited I still don't see the questions about what do you guys see as a legit components being answered. E.g. would effectively prevent us from doing stacks for say Lucine, or Drill, or something else and If the community wishes to provide some other great tool not necessary related to Apache Hadoop but useful nonetheless, why preventing it? Unless these are satisfied, any changes in the goals and charter of the project can't be put in motion.
        Hide
        Roman Shaposhnik added a comment -

        Now that we've graduated, I believe that having a clearly spelled out set of bylaws as part of our permanent web site is a must for the project. As Tom pointed out, our ASF Board resolution is an obvious kernel for this document, but obviously we have to drill down into more details to arrive at something similar to: http://hadoop.apache.org/bylaws.html

        Eli has offered a nice list of additional things that require further clarification and I see that Cos and others chimed in with really good questions. It seems like we need a way to coalesce all of this input and follow up with a PMC vote to make it official.

        If this sounds like a reasonable way forward we can start a wiki as a tool for coalescing all these ideas.

        The timing is right – I'm in the process of post-graduation infrastructure move and this feels like exactly the right thing to get nailed down now.

        Show
        Roman Shaposhnik added a comment - Now that we've graduated, I believe that having a clearly spelled out set of bylaws as part of our permanent web site is a must for the project. As Tom pointed out, our ASF Board resolution is an obvious kernel for this document, but obviously we have to drill down into more details to arrive at something similar to: http://hadoop.apache.org/bylaws.html Eli has offered a nice list of additional things that require further clarification and I see that Cos and others chimed in with really good questions. It seems like we need a way to coalesce all of this input and follow up with a PMC vote to make it official. If this sounds like a reasonable way forward we can start a wiki as a tool for coalescing all these ideas. The timing is right – I'm in the process of post-graduation infrastructure move and this feels like exactly the right thing to get nailed down now.
        Hide
        Tom White added a comment -

        Does it mean that we won't be able (as in prohibited from) to build Drill stack later in the game?

        According to the Drill proposal (http://wiki.apache.org/incubator/DrillProposal) Drill will use Hadoop as a data source, so from that point of view it could be included in Bigtop.

        Show
        Tom White added a comment - Does it mean that we won't be able (as in prohibited from) to build Drill stack later in the game? According to the Drill proposal ( http://wiki.apache.org/incubator/DrillProposal ) Drill will use Hadoop as a data source, so from that point of view it could be included in Bigtop.
        Hide
        Konstantin Boudnik added a comment -

        How about Cassandra?

        Show
        Konstantin Boudnik added a comment - How about Cassandra?
        Hide
        Bruno Mahé added a comment -

        I don't think this ticket is the place for a discussion about Apache Bigtop's policies.
        I would rather recommend to create a thread on the mailing list. We don't have to wait to establish the bylaws to start a discussion about our policies (although I would highly recommend to wait for the bylaws to be established before expecting any decision on policies).

        Show
        Bruno Mahé added a comment - I don't think this ticket is the place for a discussion about Apache Bigtop's policies. I would rather recommend to create a thread on the mailing list. We don't have to wait to establish the bylaws to start a discussion about our policies (although I would highly recommend to wait for the bylaws to be established before expecting any decision on policies).
        Hide
        Konstantin Boudnik added a comment -

        Good idea Bruno. If the discussion is to continue - please move it to the list.

        Show
        Konstantin Boudnik added a comment - Good idea Bruno. If the discussion is to continue - please move it to the list.
        Hide
        Sean Mackrory added a comment - - edited

        I've added a link to the Requirements page near the FAQ, and I've added what I consider to be a fair approximation of the quote Tom White posted from the board resolution. It sounds like there's consensus that any further changes will be part of a separate discussion on project by-laws, so I believe this issue is otherwise resolved.

        Unless somebody disagrees with the change in the Wiki, could a committer please resolve this? Thanks!

        edit: I also added a blurb about dependencies also needing to be ASL 2.0-compatible

        Show
        Sean Mackrory added a comment - - edited I've added a link to the Requirements page near the FAQ, and I've added what I consider to be a fair approximation of the quote Tom White posted from the board resolution. It sounds like there's consensus that any further changes will be part of a separate discussion on project by-laws, so I believe this issue is otherwise resolved. Unless somebody disagrees with the change in the Wiki, could a committer please resolve this? Thanks! edit: I also added a blurb about dependencies also needing to be ASL 2.0-compatible
        Hide
        Roman Shaposhnik added a comment -

        Thanks Sean! I'm closing this now. The changes look fine, but I took a liberty of replacing one bit of wording with a direct quote from our ASF charter:

        Apache Bigtop Project is responsible for the 
        creation and maintenance of software related 
        to a system for integration, packaging, 
        deployment and validation of a big data management
        software distribution based on Apache Hadoop
        
        Show
        Roman Shaposhnik added a comment - Thanks Sean! I'm closing this now. The changes look fine, but I took a liberty of replacing one bit of wording with a direct quote from our ASF charter: Apache Bigtop Project is responsible for the creation and maintenance of software related to a system for integration, packaging, deployment and validation of a big data management software distribution based on Apache Hadoop
        Hide
        Konstantin Boudnik added a comment -

        Well, I don't think my -1 has been addressed, so I see this ticket closed as premature.
        Also, how about Cassandra? I am "allowed" to add Cassandra to the stack?

        Show
        Konstantin Boudnik added a comment - Well, I don't think my -1 has been addressed, so I see this ticket closed as premature. Also, how about Cassandra? I am "allowed" to add Cassandra to the stack?
        Hide
        Roman Shaposhnik added a comment -

        What was your specific objection to? The wiki editing ended up being pretty constrained and nothing jumped out at me as objectionable, but perhaps I didn't notice something relevant.

        As for Cassandra – I don't see an immediate reason why it should be a problem, but I don't have much experience with it either.

        Show
        Roman Shaposhnik added a comment - What was your specific objection to? The wiki editing ended up being pretty constrained and nothing jumped out at me as objectionable, but perhaps I didn't notice something relevant. As for Cassandra – I don't see an immediate reason why it should be a problem, but I don't have much experience with it either.
        Hide
        Sean Mackrory added a comment -

        Although I've never really thought of Cassandra as part of the Hadoop ecosystem, it looks like a pretty good case could be made for it if there was indeed interest in adding it: http://wiki.apache.org/cassandra/HadoopSupport. It's even listed on hadoop.apache.org as a "Hadoop-related project".

        I don't believe I put anything in the wiki that was actually disputed, though. There's nothing that attempts to define "based on Hadoop" any more than the board resolution does. We are already defining the project in terms of Hadoop in multiple other locations on the site (including the home page) so I didn't consider that concept a new one at all.

        Show
        Sean Mackrory added a comment - Although I've never really thought of Cassandra as part of the Hadoop ecosystem, it looks like a pretty good case could be made for it if there was indeed interest in adding it: http://wiki.apache.org/cassandra/HadoopSupport . It's even listed on hadoop.apache.org as a "Hadoop-related project". I don't believe I put anything in the wiki that was actually disputed, though. There's nothing that attempts to define "based on Hadoop" any more than the board resolution does. We are already defining the project in terms of Hadoop in multiple other locations on the site (including the home page) so I didn't consider that concept a new one at all.
        Hide
        Eli Collins added a comment -

        Is there a jira that tracks updating the website? Per above this wiki page is practically hidden so not very useful.

        Show
        Eli Collins added a comment - Is there a jira that tracks updating the website? Per above this wiki page is practically hidden so not very useful.
        Hide
        Roman Shaposhnik added a comment -

        All web site related activities are tracker by BIGTOP-727 at this point.

        Show
        Roman Shaposhnik added a comment - All web site related activities are tracker by BIGTOP-727 at this point.
        Hide
        Konstantin Boudnik added a comment -

        Although I've never really thought of Cassandra as part of the Hadoop ecosystem

        That's the whole point, Sean: the decision of what can be included into the stack or not shall not be left to the discretion of one's perception nor minute inclinations. It should be defined as the result of the community consensus. Which isn't the case so far wrt this jira.

        Show
        Konstantin Boudnik added a comment - Although I've never really thought of Cassandra as part of the Hadoop ecosystem That's the whole point, Sean: the decision of what can be included into the stack or not shall not be left to the discretion of one's perception nor minute inclinations. It should be defined as the result of the community consensus. Which isn't the case so far wrt this jira.
        Hide
        Sean Mackrory added a comment -

        Defining Bigtop in terms of Hadoop is already in multiple other locations on the website and was part of the board resolution. What did I add to the wiki that was more specific than that?

        Show
        Sean Mackrory added a comment - Defining Bigtop in terms of Hadoop is already in multiple other locations on the website and was part of the board resolution. What did I add to the wiki that was more specific than that?
        Hide
        Bruno Mahé added a comment - - edited

        Sean> I don't recall any mention ever of components having to be "integrated with or powered by Apache Hadoop". Would you have any link?
        As stated above and in the documentation, Apache Bigtop covers "integration, packaging, deployment and validation of a big data management software distribution based on Apache Hadoop". This does not imply in any way that all components of Apache Bigtop must be "integrated with or powered by Apache Hadoop".

        But we should really give some rest to this ticket and move any remaining question to the mailing-list. So tickets can remain actionable items.

        Show
        Bruno Mahé added a comment - - edited Sean> I don't recall any mention ever of components having to be "integrated with or powered by Apache Hadoop". Would you have any link? As stated above and in the documentation, Apache Bigtop covers "integration, packaging, deployment and validation of a big data management software distribution based on Apache Hadoop". This does not imply in any way that all components of Apache Bigtop must be "integrated with or powered by Apache Hadoop". But we should really give some rest to this ticket and move any remaining question to the mailing-list. So tickets can remain actionable items.
        Hide
        Konstantin Boudnik added a comment -

        And again: +1 on what Bruno has said.
        Let's stop doing the discussions on the damn tickets.

        Show
        Konstantin Boudnik added a comment - And again: +1 on what Bruno has said. Let's stop doing the discussions on the damn tickets.

          People

          • Assignee:
            Sean Mackrory
            Reporter:
            Sean Mackrory
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development