Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6343

Make doc more explicit regarding network connectivity requirements

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 1.3.2, 1.4.0
    • Documentation
    • None

    Description

      As a new user of Spark, I read through the official documentation before attempting to stand-up my own cluster and write my own driver application. But only after attempting to run my app remotely against my cluster did I realize that full network connectivity (layer 3) is necessary between my driver program and worker nodes (i.e., my driver was listening for connections from my workers).

      I returned to the documentation to see how I had missed this requirement. On a second read-through, I saw that the doc hints at it in a few places (e.g., driver config, submitting applications suggestion, cluster overview) but never outright says it.

      I think it would help would-be users better understand how Spark works to state the network connectivity requirements right up-front in the overview section of the doc. I suggest revising the diagram and accompanying text found on the overview page:

      so that it depicts at least the directionality of the network connections initiated (perhaps like so):

      and states that the driver must listen for and accept connections from other Spark components on a variety of ports.

      Please treat my diagram and text as strawmen: I expect more experienced Spark users and developers will have better ideas on how to convey these requirements.

      Attachments

        Activity

          People

            parente Peter Parente
            parente Peter Parente
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: