[SPARK-17327] Throughput limitaion in spark standalone of simple task without calculation. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Question
Status: Resolved
Priority: Major
Resolution: Invalid
Affects Version/s: 1.6.2
Fix Version/s: None
Component/s: Java API, Windows
Labels:
- performance
Environment:

windows server 2008 R2 standard

Description

I install a spark standalone and run the spark cluster(one master and one worker) in a windows 2008 server with 16cores and 24GB memory.

I have done a simple test: Just create a string RDD and simply return it. I use JMeter to test throughput but the highest is around 35/sec. I think spark is powerful at distribute calculation, but why the throughput is so limit in such simple test scenario only contains simple task dispatch and no calculation?

1. In JMeter I test both 10 threads or 100 threads, there is little difference around 2-3/sec.
2. I test both cache/not cache the RDD, there is little difference around 1-2/sec.
3. During the test, the cpu and memory is in low level.

Below is my test code:
@RestController
public class SimpleTest {
@RequestMapping(value = "/SimpleTest", method = RequestMethod.GET)
@ResponseBody
public String testProcessTransaction()

{ return SparkShardTest.simpleRDDTest(); }

}

final static Map<String, JavaRDD<String>> simpleRDDs = initSimpleRDDs();
public static Map<String, JavaRDD<String>> initSimpleRDDs()

{ Map<String, JavaRDD<String>> result = new ConcurrentHashMap<String, JavaRDD<String>>(); JavaRDD<String> rddData = JavaSC.parallelize(data; rddData.cache().count(); //this cache will improve 1-2/sec result.put("MyRDD", rddData); return result; }

public static String simpleRDDTest()

{ JavaRDD<String> rddData = simpleRDDs.get("MyRDD"); return rddData.first(); }

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: xiefeng

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 31/Aug/16 07:32

Updated:: 31/Aug/16 09:12

Resolved:: 31/Aug/16 09:12