[HIVE-81] Make forrest docs for Hive web site along the lines of http://hadoop.apache.org/core/ - ASF JIRA

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.3.0
Component/s: Documentation
Labels:
None

Description

Hive should ship with documentation, like Hadoop, instead of using the wiki as the official documentation repository. To get there, we'll need a set of xml files to grind through forrest, if we want to reuse the same mechanisms as the other sites.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-81.patch
27/Nov/08 04:27
213 kB
Jeff Hammerbacher
favicon.ico
02/Dec/08 18:25
0.7 kB
Jeff Hammerbacher
hadoop-logo.jpg
02/Dec/08 18:25
9 kB
Jeff Hammerbacher
hive_small.jpg
02/Dec/08 18:25
3 kB
Jeff Hammerbacher

Issue Links

blocks

HADOOP-4736 Add a link to Hive under "Related Projects" on http://hadoop.apache.org/core/

Resolved

Activity

Ascending order - Click to sort in descending order

Jeff Hammerbacher added a comment - 27/Nov/08 04:27

NOTE: this patch generated by an svn diff against a base hive checkout (svn co http://svn.apache.org/repos/asf/hadoop/hive hive-base).

Full Hive site and initial build with Apache Forrest. Suprisingly nontrivial to build, but I basically ripped off the HBase site, thanks guys!

I listed Ashish, Zheng, and Dhruba under credits.xml. Let's get this baby up ASAP!

Later,
Jeff

Jeff Hammerbacher added a comment - 27/Nov/08 04:27 NOTE: this patch generated by an svn diff against a base hive checkout (svn co http://svn.apache.org/repos/asf/hadoop/hive hive-base). Full Hive site and initial build with Apache Forrest. Suprisingly nontrivial to build, but I basically ripped off the HBase site, thanks guys! I listed Ashish, Zheng, and Dhruba under credits.xml. Let's get this baby up ASAP! Later, Jeff

Joydeep Sen Sarma added a comment - 27/Nov/08 04:31

Ashish and I talked about this earlier this week.

Another option (simpler than forrest) seems to be apt (wiki like). We can compile apt files via Maven. See:

http://maven.apache.org/guides/mini/guide-site.html

and

http://maven.apache.org/doxia/references/apt-format.html

thoughts?

one thing we noticed was that Google does not actually show up hadoop official documentation on search queries. Javadocs rank really high (probably because of all the hyper links and i have personally found package headers to be really good documentation for hadoop (but on the other hand - hadoop does not have much declarative stuff). (perhaps this is a second order problem - but i am concerned that the documentation we create painstakingly will be the one that users actually refer to)

Joydeep Sen Sarma added a comment - 27/Nov/08 04:31 Ashish and I talked about this earlier this week. Another option (simpler than forrest) seems to be apt (wiki like). We can compile apt files via Maven. See: http://maven.apache.org/guides/mini/guide-site.html and http://maven.apache.org/doxia/references/apt-format.html thoughts? one thing we noticed was that Google does not actually show up hadoop official documentation on search queries. Javadocs rank really high (probably because of all the hyper links and i have personally found package headers to be really good documentation for hadoop (but on the other hand - hadoop does not have much declarative stuff). (perhaps this is a second order problem - but i am concerned that the documentation we create painstakingly will be the one that users actually refer to)

Jeff Hammerbacher added a comment - 27/Nov/08 04:49

Introducing another dynamic documentation generator to the project seems like a larger decision that I wouldn't want to block on.

Good catch on rankings for Google search results: having fixed urls for the latest documentation would help. Second order problem though: I say we toss this bad boy up and open a new ticket to discuss making the web site for Hive better. I'm more of a fan of PHP/Python/real web development languages rather thank hacky XSLT stuff, but when in Rome...

Jeff Hammerbacher added a comment - 27/Nov/08 04:49 Introducing another dynamic documentation generator to the project seems like a larger decision that I wouldn't want to block on. Good catch on rankings for Google search results: having fixed urls for the latest documentation would help. Second order problem though: I say we toss this bad boy up and open a new ticket to discuss making the web site for Hive better. I'm more of a fan of PHP/Python/real web development languages rather thank hacky XSLT stuff, but when in Rome...

Doug Cutting added a comment - 01/Dec/08 20:34

> Google does not actually show up hadoop official documentation on search queries.

Can you give some examples? Y! or Google searches for "HDFS" both find:

http://hadoop.apache.org/core/docs/current/hdfs_design.html

Doug Cutting added a comment - 01/Dec/08 20:34 > Google does not actually show up hadoop official documentation on search queries. Can you give some examples? Y! or Google searches for "HDFS" both find: http://hadoop.apache.org/core/docs/current/hdfs_design.html

Joydeep Sen Sarma added a comment - 02/Dec/08 03:39

depends a lot on the query. don't remember what i tried the other day - but u are right - we get the hdfs arch document. but what we should have gotten in many cases are the dfs command manual.

for example: query="hadoop set file system replication level" (or try variants)

the best answer is probably: http://hadoop.apache.org/core/docs/r0.19.0/hdfs_shell.html#setrep

but this doesn't show up in the top 10 results (at least on goog)

i guess we can have examples both ways and perhaps this is a case of premature optimization. we should just get the docs in first - but the format question is still interesting (like wiki like format for apt).

Joydeep Sen Sarma added a comment - 02/Dec/08 03:39 depends a lot on the query. don't remember what i tried the other day - but u are right - we get the hdfs arch document. but what we should have gotten in many cases are the dfs command manual. for example: query="hadoop set file system replication level" (or try variants) the best answer is probably: http://hadoop.apache.org/core/docs/r0.19.0/hdfs_shell.html#setrep but this doesn't show up in the top 10 results (at least on goog) i guess we can have examples both ways and perhaps this is a case of premature optimization. we should just get the docs in first - but the format question is still interesting (like wiki like format for apt).

Zheng Shao added a comment - 02/Dec/08 04:11

I would say for consistency we would prefer forrest. All other sub projects are also using forrest. At least it gives the same look-and-feel.

If later all people believe apt wiki-like format is better, we should switch all sub projects.

It seems to me it is really important to get this out asap. The page could be simple (e.g. don't overlap with the existing README file in the hive package) but can serve as a single entry point for new users.

Zheng Shao added a comment - 02/Dec/08 04:11 I would say for consistency we would prefer forrest. All other sub projects are also using forrest. At least it gives the same look-and-feel. If later all people believe apt wiki-like format is better, we should switch all sub projects. It seems to me it is really important to get this out asap. The page could be simple (e.g. don't overlap with the existing README file in the hive package) but can serve as a single entry point for new users.

Jeff Hammerbacher added a comment - 02/Dec/08 05:55

The attached patch generates a simple Hive website with the same look and feel of the rest of the Hadoop subprojects using Apache Forrest. Let me know if we're ready to move forward and I can hit "Submit Patch" any time.

Jeff Hammerbacher added a comment - 02/Dec/08 05:55 The attached patch generates a simple Hive website with the same look and feel of the rest of the Hadoop subprojects using Apache Forrest. Let me know if we're ready to move forward and I can hit "Submit Patch" any time.

Zheng Shao added a comment - 02/Dec/08 08:44

I tried to do "ant". 3 image files are missing. Is that because svn diff/patch does not accept binary files?

[exec] X [0] images/hadoop-logo.jpg BROKEN: /xxx/apache-hadoop-hive-readonly/site/author/src/documentation/content/xdocs/images.hadoop-logo.jpg (No such file or directory)
[exec] * [13/10] [1/18] 0.253s 6.4Kb credits.html
[exec] X [0] images/hive_small.jpg BROKEN: /xxx/apache-hadoop-hive-readonly/site/author/src/documentation/content/xdocs/images.hive_small.jpg (No such file or directory)
[exec] * [15/8] [0/0] 0.111s 4.6Kb credits.pdf
[exec] X [0] images/favicon.ico BROKEN: /xxx/apache-hadoop-hive-readonly/site/author/src/documentation/content/xdocs/images.favicon.ico (No such file or directory)

Zheng Shao added a comment - 02/Dec/08 08:44 I tried to do "ant". 3 image files are missing. Is that because svn diff/patch does not accept binary files? [exec] X [0] images/hadoop-logo.jpg BROKEN: /xxx/apache-hadoop-hive-readonly/site/author/src/documentation/content/xdocs/images.hadoop-logo.jpg (No such file or directory) [exec] * [13/10] [1/18] 0.253s 6.4Kb credits.html [exec] X [0] images/hive_small.jpg BROKEN: /xxx/apache-hadoop-hive-readonly/site/author/src/documentation/content/xdocs/images.hive_small.jpg (No such file or directory) [exec] * [15/8] [0/0] 0.111s 4.6Kb credits.pdf [exec] X [0] images/favicon.ico BROKEN: /xxx/apache-hadoop-hive-readonly/site/author/src/documentation/content/xdocs/images.favicon.ico (No such file or directory)

Ashish Thusoo added a comment - 02/Dec/08 11:50

We talked about this a bit more. Considering that forrest is used by hadoop, we are fine with following the same route though the advantages of generating documents in the wiki style, that are available through doxia and maven are also quite neat. As Zheng mentions, we can address that later, though my guess is that it will be difficult to make a case to switch to another CMS for the other subprojects. Anyway, we do not need to reinvent the wheel here so I am fine with using forrest.

Will take a look at this in the afternoon today and send in my comments. Preliminarily, I think we should not check in generated code (which I think this patch is doing) and instead just check in the basic sources. Otherwise, we will have the same problem that Doug mentioned in a separate thread about small doc changes generating huge checkins and overwhelming the commit mails and svn etc...

Can you make the necessary changes so that the docs are created in the build directory (maybe build/docs and build/site).

Thanks...

Ashish Thusoo added a comment - 02/Dec/08 11:50 We talked about this a bit more. Considering that forrest is used by hadoop, we are fine with following the same route though the advantages of generating documents in the wiki style, that are available through doxia and maven are also quite neat. As Zheng mentions, we can address that later, though my guess is that it will be difficult to make a case to switch to another CMS for the other subprojects. Anyway, we do not need to reinvent the wheel here so I am fine with using forrest. Will take a look at this in the afternoon today and send in my comments. Preliminarily, I think we should not check in generated code (which I think this patch is doing) and instead just check in the basic sources. Otherwise, we will have the same problem that Doug mentioned in a separate thread about small doc changes generating huge checkins and overwhelming the commit mails and svn etc... Can you make the necessary changes so that the docs are created in the build directory (maybe build/docs and build/site). Thanks...

Jeff Hammerbacher added a comment - 02/Dec/08 18:24

All of the other Hadoop subprojects check in the latest version of the generated code under publish/; I'm just following convention by checking that in.

There are no docs generated at build time, so I'm not sure that adding a "docs" target to put these in docs/ makes sense right now. There's another ticket open for when there's actual documentation, rather than just the website.

I will attach the static files, as it appears they aren't included as the output of a patch.

Jeff Hammerbacher added a comment - 02/Dec/08 18:24 All of the other Hadoop subprojects check in the latest version of the generated code under publish/; I'm just following convention by checking that in. There are no docs generated at build time, so I'm not sure that adding a "docs" target to put these in docs/ makes sense right now. There's another ticket open for when there's actual documentation, rather than just the website. I will attach the static files, as it appears they aren't included as the output of a patch.

Ashish Thusoo added a comment - 02/Dec/08 19:16

I know that all the subprojects are doing this and the convention is to checkin the generated file, but I do think it makes more sense not to do that and just generate the code when the publish target is called. I am not sure that there are any significant advantages of checking in generated code but there seem to be a lot of disadvantages. This is also something that is being discussed in the larger hadoop context. The thread where this is being discussed is as follows:

http://mail-archives.apache.org/mod_mbox/hadoop-core-dev/200812.mbox/browser

Clearly even there the preference is not to check in the generated docs.

Ashish Thusoo added a comment - 02/Dec/08 19:16 I know that all the subprojects are doing this and the convention is to checkin the generated file, but I do think it makes more sense not to do that and just generate the code when the publish target is called. I am not sure that there are any significant advantages of checking in generated code but there seem to be a lot of disadvantages. This is also something that is being discussed in the larger hadoop context. The thread where this is being discussed is as follows: http://mail-archives.apache.org/mod_mbox/hadoop-core-dev/200812.mbox/browser Clearly even there the preference is not to check in the generated docs.

Doug Cutting added a comment - 02/Dec/08 19:56

In Core, we are moving away from checking in versioned end-user documentation. But we still intend to checkin the project website. Apache infrastructure prefers this. Versioned documentation will be extracted from release tarballs and posted to the website as part of the release process.

Doug Cutting added a comment - 02/Dec/08 19:56 In Core, we are moving away from checking in versioned end-user documentation. But we still intend to checkin the project website. Apache infrastructure prefers this. Versioned documentation will be extracted from release tarballs and posted to the website as part of the release process.

Jeff Hammerbacher added a comment - 04/Dec/08 00:18

Given Doug's comments, are there any other modifications to the patch required to get this patch committed?

Jeff Hammerbacher added a comment - 04/Dec/08 00:18 Given Doug's comments, are there any other modifications to the patch required to get this patch committed?

Doug Cutting added a comment - 04/Dec/08 01:06

I just committed this. Thanks, Jeff!

Doug Cutting added a comment - 04/Dec/08 01:06 I just committed this. Thanks, Jeff!

People

Assignee:: Jeff Hammerbacher

Reporter:: Jeff Hammerbacher

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 26/Nov/08 23:44

Updated:: 17/Dec/11 00:09

Resolved:: 04/Dec/08 01:06