Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-1218

Usage of toLocalIterator Produces large amount of Spark Jobs

    XMLWordPrintableJSON

Details

    Description

      https://github.com/apache/incubator-tinkerpop/blob/master/spark-gremlin/src/main/java/org/apache/tinkerpop/gremlin/spark/structure/io/PersistedOutputRDD.java#L72

      Will end up creating a separate Spark Job for every task in the RDD. This will overwhelm the UI with un-important information and shouldn't be relevant to users attempting diagnostics. Since this RDD is relatively small we should be fine switching this line to a `.collect` call which will pull the entire RDD down to the driver in 1 Job.

      So as long as the total size of this RDD is on the scale of megabytes we can make a readable user interface with

              return IteratorUtils.map(memoryRDD.collect().iterator(), tuple -> new KeyValue<>(tuple._1(), tuple._2()));
      

      Attachments

        Activity

          People

            okram Marko A. Rodriguez
            rspitzer Russell Spitzer
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: