Flume
  1. Flume
  2. FLUME-1987

Improving Documentation for Apache Flume

    Details

    • Type: Brainstorming Brainstorming
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: v1.0.0, v1.2.0, v1.4.0, v1.3.1
    • Fix Version/s: None
    • Component/s: Docs, Web
    • Labels:
      None

      Description

      Hello Everyone,

      I have been giving this a great deal of thought over the last 3 weeks.

      I would really appreciate feedback on how we should proceed on this from users, developers (committers), and project management perspectives.

      It would really be nice for the documentation for the project to be moved from git into the wiki and organized in a manner that makes it easy to locate information on features, components, processes etc.

      This will allow us to:

      1. Break things down into easily digestible chunks of information rather than having it in one long page.

      2. Be able to update and publish information for previous and future releases immediately.

      Some of the wiki pages are buried deep down and may not be reachable from the wiki home page.

      It's also a bit confusing to find information from the two versions of the wiki home page

      https://cwiki.apache.org/FLUME/home.html

      https://cwiki.apache.org/confluence/display/FLUME/Home

      I like how projects like Apache Solr (http://wiki.apache.org/solr) and PHP.net (http://www.php.net/manual/en) organize information.

      I think we can learn a few things from how their project's documentation is structured.

      We could add comments to specific features, if they behave or should be configured differently in different versions/releases.

      What I would like to start with is:

      1. Re-organization of the wiki home page to have a navigable table of contents carefully broken down into easy-to-digest sections.
      2. Sections that document the overall architecture of the product.
      3. Sections that document the various features of the product (Sources, Channels, Sinks, Interceptors, DeSerializers etc).
      4. Sections that document tips, techniques and processes for contributors, developers troubleshooting issues, active committers and PMC members.
      5. FAQ index compiling and providing solutions to commonly-asked questions from the user and developer mailing lists.
      6. Re-introduction of Flume explaining what it is and what it isn't. Also explaining use cases where Flume is applicable.

      Making this information readily available for new comers will really improve the rate of adoption and strengthen the community in the immediate future.

      After we have gathered enough feedback and have some direction as to how we prefer to proceed, I will create a group of tasks or user stories for the various components of the documentation efforts.

      Please add your responses and comments to this JIRA issue so that we can track it and collaborate efficiently.

      Thanks.

        Activity

        Hide
        Erik Bertrand added a comment -

        My feedback:

        • I find the jQuery documentation to be well-organized, as well as attractive
        • Comments to pretty much any documentation page, or at least the "reference" documentation, is essential; this is what makes the PHP documentation particularly useful
        • That said, infusing "real-world" use cases into the documentation – even the reference pages – would be very useful; perhaps simply through an "Examples" section
        • fwiw, I did have trouble find the right Flume documentation as I was getting into using the product, finding both the main and the Confluence versions; it was difficult to know which was "bible", or at least the latest
        • I'd suggest a section that lists version history with changelog; perhaps it could simply be linked to

        This is an excellent project, well worth the effort! Thank you for spearheading it.

        Show
        Erik Bertrand added a comment - My feedback: I find the jQuery documentation to be well-organized, as well as attractive Comments to pretty much any documentation page, or at least the "reference" documentation, is essential; this is what makes the PHP documentation particularly useful That said, infusing "real-world" use cases into the documentation – even the reference pages – would be very useful; perhaps simply through an "Examples" section fwiw, I did have trouble find the right Flume documentation as I was getting into using the product, finding both the main and the Confluence versions; it was difficult to know which was "bible", or at least the latest I'd suggest a section that lists version history with changelog; perhaps it could simply be linked to This is an excellent project, well worth the effort! Thank you for spearheading it.
        Hide
        Israel Ekpo added a comment -

        Thanks Erik for your feedback, I will keep the jQuery documentation structure in mind as we continue with the analysis.

        Show
        Israel Ekpo added a comment - Thanks Erik for your feedback, I will keep the jQuery documentation structure in mind as we continue with the analysis.
        Hide
        Edward Sargisson added a comment -

        There were some comments on the dev list about the documentation required for developers to build the code. Since I've recently had to learn to do this I thought I would document my findings somewhere.

        1. The umask needs to be 0022 or the HDFS unit tests will fail (they create a mini cluster on disk and HDFS checks that the files have the expected permissions).
        2. The patch from Flume-1262 needs to be applied - otherwise the build will fail attempting to create javadoc on jar files that haven't been built yet.
        3. Setting the MAVEN_OPTS is important - but not always documented where you expect it.

        Show
        Edward Sargisson added a comment - There were some comments on the dev list about the documentation required for developers to build the code. Since I've recently had to learn to do this I thought I would document my findings somewhere. 1. The umask needs to be 0022 or the HDFS unit tests will fail (they create a mini cluster on disk and HDFS checks that the files have the expected permissions). 2. The patch from Flume-1262 needs to be applied - otherwise the build will fail attempting to create javadoc on jar files that haven't been built yet. 3. Setting the MAVEN_OPTS is important - but not always documented where you expect it.
        Hide
        Mike Percy added a comment -

        Hi Israel & gents, I would love to see the documentation improved. However I am not sure moving all the docs to the wiki is the right answer.

        Particularly for reference material (configuration options and the like), a snapshot for each release is very useful, which is why it's checked into the codebase. However I think it's obvious that we are missing a lot of useful documentation.

        The way I see it, at a high level without delving into the important details of content and organization, we have two viable approaches:
        1. Create a new web site that contains the new docs and check in the source to this web site in Apache SVN.
        2. Lean more heavily on the Wiki by improving its design and architecture.

        Either way I feel it's important to keep the reference docs with the code, so that we can ask people to provide patches which update the user docs whenever they provide a new feature.

        Please let me know your thoughts.

        Show
        Mike Percy added a comment - Hi Israel & gents, I would love to see the documentation improved. However I am not sure moving all the docs to the wiki is the right answer. Particularly for reference material (configuration options and the like), a snapshot for each release is very useful, which is why it's checked into the codebase. However I think it's obvious that we are missing a lot of useful documentation. The way I see it, at a high level without delving into the important details of content and organization, we have two viable approaches: 1. Create a new web site that contains the new docs and check in the source to this web site in Apache SVN. 2. Lean more heavily on the Wiki by improving its design and architecture. Either way I feel it's important to keep the reference docs with the code, so that we can ask people to provide patches which update the user docs whenever they provide a new feature. Please let me know your thoughts.
        Hide
        Mike Percy added a comment -

        By the way, we could also add to the existing web site which is located @ https://svn.apache.org/repos/asf/flume/site/trunk/

        There is current a manual step where the user guide & dev guide docs are copied over to the web site before deploying it, which is less than ideal but works. Note that for Apache project web sites (aside from Wiki) the process of building the web site has to comply with svnpubsub per Infra <http://www.apache.org/dev/project-site.html> ...

        tl;dr: My understanding is that we have to check a built, static copy of the site in to Apache SVN in order to host the web site on flume.apache.org

        Show
        Mike Percy added a comment - By the way, we could also add to the existing web site which is located @ https://svn.apache.org/repos/asf/flume/site/trunk/ There is current a manual step where the user guide & dev guide docs are copied over to the web site before deploying it, which is less than ideal but works. Note that for Apache project web sites (aside from Wiki) the process of building the web site has to comply with svnpubsub per Infra < http://www.apache.org/dev/project-site.html > ... tl;dr: My understanding is that we have to check a built, static copy of the site in to Apache SVN in order to host the web site on flume.apache.org
        Hide
        Roshan Naik added a comment -

        Hi Israel & gents, I would love to see the documentation improved. However I am not sure moving all the docs to the wiki is the right answer.

        yes.. i feel the same.

        Show
        Roshan Naik added a comment - Hi Israel & gents, I would love to see the documentation improved. However I am not sure moving all the docs to the wiki is the right answer. yes.. i feel the same.
        Hide
        Israel Ekpo added a comment -

        Excellent feedback Gentlemen,

        I see your point about keeping the reference docs with the code because it forces one to update the docs whenever the code is updated.

        Regarding the documentation site, I am leaning more towards using the wiki because it is much faster and easier to format, update and share materials when compared to the checking in site materials via source control.

        So the reference documentation could stay with the code but examples and information that would elaborate on the features of the product can live in the wiki.

        I will think more about this and see what we can come up with in terms of how to attack this.

        Show
        Israel Ekpo added a comment - Excellent feedback Gentlemen, I see your point about keeping the reference docs with the code because it forces one to update the docs whenever the code is updated. Regarding the documentation site, I am leaning more towards using the wiki because it is much faster and easier to format, update and share materials when compared to the checking in site materials via source control. So the reference documentation could stay with the code but examples and information that would elaborate on the features of the product can live in the wiki. I will think more about this and see what we can come up with in terms of how to attack this.
        Hide
        Roshan Naik added a comment -

        the user guide is also version specific.

        Show
        Roshan Naik added a comment - the user guide is also version specific.
        Hide
        Ralph Goers added a comment -

        The documentation on the web site has to be manually copied there because it is for a specific version and because it requires Maven and Sphinx to build it.

        The ASF CMS allows other options where content could be directly edited on the CMS if that was wanted. However, you would still probably want to copy that to another directory so you could keep track of the documentation for each release. Also, part of the process is to build a PDF version of the documentation for each release to make it easier for users to have an offline version of the documentation.

        Show
        Ralph Goers added a comment - The documentation on the web site has to be manually copied there because it is for a specific version and because it requires Maven and Sphinx to build it. The ASF CMS allows other options where content could be directly edited on the CMS if that was wanted. However, you would still probably want to copy that to another directory so you could keep track of the documentation for each release. Also, part of the process is to build a PDF version of the documentation for each release to make it easier for users to have an offline version of the documentation.

          People

          • Assignee:
            Israel Ekpo
            Reporter:
            Israel Ekpo
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Due:
              Created:
              Updated:

              Time Tracking

              Estimated:
              Original Estimate - 672h
              672h
              Remaining:
              Remaining Estimate - 672h
              672h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development