Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6343

Make doc more explicit regarding network connectivity requirements


    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.2, 1.4.0
    • Component/s: Documentation
    • Labels:


      As a new user of Spark, I read through the official documentation before attempting to stand-up my own cluster and write my own driver application. But only after attempting to run my app remotely against my cluster did I realize that full network connectivity (layer 3) is necessary between my driver program and worker nodes (i.e., my driver was listening for connections from my workers).

      I returned to the documentation to see how I had missed this requirement. On a second read-through, I saw that the doc hints at it in a few places (e.g., driver config, submitting applications suggestion, cluster overview) but never outright says it.

      I think it would help would-be users better understand how Spark works to state the network connectivity requirements right up-front in the overview section of the doc. I suggest revising the diagram and accompanying text found on the overview page:

      so that it depicts at least the directionality of the network connections initiated (perhaps like so):

      and states that the driver must listen for and accept connections from other Spark components on a variety of ports.

      Please treat my diagram and text as strawmen: I expect more experienced Spark users and developers will have better ideas on how to convey these requirements.




            • Assignee:
              parente Peter Parente
              parente Peter Parente
            • Votes:
              0 Vote for this issue
              2 Start watching this issue


              • Created: