Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11305

Remove Third-Party Hadoop Distributions Doc Page

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.6.0
    • Documentation
    • None

    Description

      There is a fairly old page in our docs that contains a bunch of assorted information regarding running Spark on Hadoop clusters. I think this page should be removed and merged into other parts of the docs because the information is largely redundant and somewhat outdated.

      http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html

      There are three sections:

      1. Compile time Hadoop version - this information I think can be removed in favor of that on the "building spark" page. These days most "advanced users" are building without bundling Hadoop, so I'm not sure giving them a bunch of different Hadoop versions sends the right message.

      2. Linking against Hadoop - this doesn't seem to add much beyond what is in the programming guide.

      3. Where to run Spark - redundant with the hardware provisioning guide.

      4. Inheriting cluster configurations - I think this would be better as a section at the end of the configuration page.

      Attachments

        Activity

          People

            srowen Sean R. Owen
            pwendell Patrick Wendell
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: