Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Hive should ship with documentation, like Hadoop, instead of using the wiki as the official documentation repository. To get there, we'll need a set of xml files to grind through forrest, if we want to reuse the same mechanisms as the other sites.
Attachments
Attachments
- favicon.ico
- 0.7 kB
- Jeff Hammerbacher
- hadoop-logo.jpg
- 9 kB
- Jeff Hammerbacher
- hive_small.jpg
- 3 kB
- Jeff Hammerbacher
- HIVE-81.patch
- 213 kB
- Jeff Hammerbacher
Issue Links
- blocks
-
HADOOP-4736 Add a link to Hive under "Related Projects" on http://hadoop.apache.org/core/
- Resolved
Activity
Ashish and I talked about this earlier this week.
Another option (simpler than forrest) seems to be apt (wiki like). We can compile apt files via Maven. See:
http://maven.apache.org/guides/mini/guide-site.html
and
http://maven.apache.org/doxia/references/apt-format.html
thoughts?
one thing we noticed was that Google does not actually show up hadoop official documentation on search queries. Javadocs rank really high (probably because of all the hyper links and i have personally found package headers to be really good documentation for hadoop (but on the other hand - hadoop does not have much declarative stuff). (perhaps this is a second order problem - but i am concerned that the documentation we create painstakingly will be the one that users actually refer to)
Introducing another dynamic documentation generator to the project seems like a larger decision that I wouldn't want to block on.
Good catch on rankings for Google search results: having fixed urls for the latest documentation would help. Second order problem though: I say we toss this bad boy up and open a new ticket to discuss making the web site for Hive better. I'm more of a fan of PHP/Python/real web development languages rather thank hacky XSLT stuff, but when in Rome...
> Google does not actually show up hadoop official documentation on search queries.
Can you give some examples? Y! or Google searches for "HDFS" both find:
depends a lot on the query. don't remember what i tried the other day - but u are right - we get the hdfs arch document. but what we should have gotten in many cases are the dfs command manual.
for example: query="hadoop set file system replication level" (or try variants)
the best answer is probably: http://hadoop.apache.org/core/docs/r0.19.0/hdfs_shell.html#setrep
but this doesn't show up in the top 10 results (at least on goog)
i guess we can have examples both ways and perhaps this is a case of premature optimization. we should just get the docs in first - but the format question is still interesting (like wiki like format for apt).
I would say for consistency we would prefer forrest. All other sub projects are also using forrest. At least it gives the same look-and-feel.
If later all people believe apt wiki-like format is better, we should switch all sub projects.
It seems to me it is really important to get this out asap. The page could be simple (e.g. don't overlap with the existing README file in the hive package) but can serve as a single entry point for new users.
The attached patch generates a simple Hive website with the same look and feel of the rest of the Hadoop subprojects using Apache Forrest. Let me know if we're ready to move forward and I can hit "Submit Patch" any time.
I tried to do "ant". 3 image files are missing. Is that because svn diff/patch does not accept binary files?
[exec] X [0] images/hadoop-logo.jpg BROKEN: /xxx/apache-hadoop-hive-readonly/site/author/src/documentation/content/xdocs/images.hadoop-logo.jpg (No such file or directory)
[exec] * [13/10] [1/18] 0.253s 6.4Kb credits.html
[exec] X [0] images/hive_small.jpg BROKEN: /xxx/apache-hadoop-hive-readonly/site/author/src/documentation/content/xdocs/images.hive_small.jpg (No such file or directory)
[exec] * [15/8] [0/0] 0.111s 4.6Kb credits.pdf
[exec] X [0] images/favicon.ico BROKEN: /xxx/apache-hadoop-hive-readonly/site/author/src/documentation/content/xdocs/images.favicon.ico (No such file or directory)
We talked about this a bit more. Considering that forrest is used by hadoop, we are fine with following the same route though the advantages of generating documents in the wiki style, that are available through doxia and maven are also quite neat. As Zheng mentions, we can address that later, though my guess is that it will be difficult to make a case to switch to another CMS for the other subprojects. Anyway, we do not need to reinvent the wheel here so I am fine with using forrest.
Will take a look at this in the afternoon today and send in my comments. Preliminarily, I think we should not check in generated code (which I think this patch is doing) and instead just check in the basic sources. Otherwise, we will have the same problem that Doug mentioned in a separate thread about small doc changes generating huge checkins and overwhelming the commit mails and svn etc...
Can you make the necessary changes so that the docs are created in the build directory (maybe build/docs and build/site).
Thanks...
All of the other Hadoop subprojects check in the latest version of the generated code under publish/; I'm just following convention by checking that in.
There are no docs generated at build time, so I'm not sure that adding a "docs" target to put these in docs/ makes sense right now. There's another ticket open for when there's actual documentation, rather than just the website.
I will attach the static files, as it appears they aren't included as the output of a patch.
I know that all the subprojects are doing this and the convention is to checkin the generated file, but I do think it makes more sense not to do that and just generate the code when the publish target is called. I am not sure that there are any significant advantages of checking in generated code but there seem to be a lot of disadvantages. This is also something that is being discussed in the larger hadoop context. The thread where this is being discussed is as follows:
http://mail-archives.apache.org/mod_mbox/hadoop-core-dev/200812.mbox/browser
Clearly even there the preference is not to check in the generated docs.
In Core, we are moving away from checking in versioned end-user documentation. But we still intend to checkin the project website. Apache infrastructure prefers this. Versioned documentation will be extracted from release tarballs and posted to the website as part of the release process.
Given Doug's comments, are there any other modifications to the patch required to get this patch committed?
NOTE: this patch generated by an svn diff against a base hive checkout (svn co http://svn.apache.org/repos/asf/hadoop/hive hive-base).
Full Hive site and initial build with Apache Forrest. Suprisingly nontrivial to build, but I basically ripped off the HBase site, thanks guys!
I listed Ashish, Zheng, and Dhruba under credits.xml. Let's get this baby up ASAP!
Later,
Jeff