[SPARK-6343] Make doc more explicit regarding network connectivity requirements - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.3.2, 1.4.0
Component/s: Documentation
Labels:
None

Description

As a new user of Spark, I read through the official documentation before attempting to stand-up my own cluster and write my own driver application. But only after attempting to run my app remotely against my cluster did I realize that full network connectivity (layer 3) is necessary between my driver program and worker nodes (i.e., my driver was listening for connections from my workers).

I returned to the documentation to see how I had missed this requirement. On a second read-through, I saw that the doc hints at it in a few places (e.g., driver config, submitting applications suggestion, cluster overview) but never outright says it.

I think it would help would-be users better understand how Spark works to state the network connectivity requirements right up-front in the overview section of the doc. I suggest revising the diagram and accompanying text found on the overview page:

so that it depicts at least the directionality of the network connections initiated (perhaps like so):

and states that the driver must listen for and accept connections from other Spark components on a variety of ports.

Please treat my diagram and text as strawmen: I expect more experienced Spark users and developers will have better ideas on how to convey these requirements.

Attachments

Issue Links

links to

[Github] Pull Request #5382 (parente)

Activity

People

Assignee:: Peter Parente

Reporter:: Peter Parente

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 15/Mar/15 20:41

Updated:: 09/Apr/15 10:39

Resolved:: 09/Apr/15 10:38