Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-1218

Usage of toLocalIterator Produces large amount of Spark Jobs

    XMLWordPrintableJSON

    Details

      Description

      https://github.com/apache/incubator-tinkerpop/blob/master/spark-gremlin/src/main/java/org/apache/tinkerpop/gremlin/spark/structure/io/PersistedOutputRDD.java#L72

      Will end up creating a separate Spark Job for every task in the RDD. This will overwhelm the UI with un-important information and shouldn't be relevant to users attempting diagnostics. Since this RDD is relatively small we should be fine switching this line to a `.collect` call which will pull the entire RDD down to the driver in 1 Job.

      So as long as the total size of this RDD is on the scale of megabytes we can make a readable user interface with

              return IteratorUtils.map(memoryRDD.collect().iterator(), tuple -> new KeyValue<>(tuple._1(), tuple._2()));
      

        Attachments

          Activity

            People

            • Assignee:
              okram Marko A. Rodriguez
              Reporter:
              rspitzer Russell Spitzer
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: