Uploaded image for project: 'TinkerPop'
  1. TinkerPop
  2. TINKERPOP-2081

PersistedOutputRDD materialises rdd lazily with Spark 2.x

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.3.4
    • Fix Version/s: 3.4.0, 3.3.5
    • Component/s: hadoop
    • Labels:
      None

      Description

      PersistedOutputRDD is not actually persist RDD in spark memory but mark it for lazy caching in the future. It looks like caching was eager in Spark 1.6, but in spark 2.0 it lazy.
      The lazy caching looks wrong for this case, the source graph could be changed after snapshot is created and snapshot should not be affected by that changes.

      The fix itself is simple: PersistedOutputRDD should call any spark action to trigger eager caching. For example count()

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                spmallette Stephen Mallette
                Reporter:
                artem.aliev Artem Aliev
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: