[TINKERPOP-1218] Usage of toLocalIterator Produces large amount of Spark Jobs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 3.1.1-incubating
Fix Version/s: 3.2.0-incubating, 3.1.2-incubating
Component/s: hadoop
Labels:
None

Description

https://github.com/apache/incubator-tinkerpop/blob/master/spark-gremlin/src/main/java/org/apache/tinkerpop/gremlin/spark/structure/io/PersistedOutputRDD.java#L72

Will end up creating a separate Spark Job for every task in the RDD. This will overwhelm the UI with un-important information and shouldn't be relevant to users attempting diagnostics. Since this RDD is relatively small we should be fine switching this line to a `.collect` call which will pull the entire RDD down to the driver in 1 Job.

So as long as the total size of this RDD is on the scale of megabytes we can make a readable user interface with

        return IteratorUtils.map(memoryRDD.collect().iterator(), tuple -> new KeyValue<>(tuple._1(), tuple._2()));

Attachments

Activity

People

Assignee:: Marko A. Rodriguez

Reporter:: Russell Spitzer

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 11/Mar/16 17:36

Updated:: 14/Mar/16 15:32

Resolved:: 14/Mar/16 15:32