Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
3.1.0-incubating
-
None
Description
I always thought there was a Spark option to say stuff like default.persist=DISK_SER_1, but I can't seem to find it.
If no such option exists, then we should add it to Spark-Gremlin. For instance:
gremlin.spark.storageLevel=DISK_ONLY
See: http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence
Then we would need to go through and where we have ...cache() calls, they need to be changed to ....persist(StorageLevel.valueOf(conf.get("gremlin.spark.storageLevel","MEMORY_ONLY").
The question then becomes, do we provide flexibility where the user can have the program caching different from the persisted RDD caching :|.... Too many configurations sucks.