"context.py"
def addFile(self, path): """ ... >>> from pyspark import SparkFiles >>> path = os.path.join(tempdir, "test.txt") >>> with open(path, "w") as testFile: ... testFile.write("100") >>> sc.addFile(path) >>> def func(iterator): ... with open(SparkFiles.get("test.txt")) as testFile: ... fileVal = int(testFile.readline()) ... return [x * 100 for x in iterator] >>> sc.parallelize([1, 2, 3, 4]).mapPartitions(func).collect() [100, 200, 300, 400] """
This is example that write 100 to temp file and distribute it and use it's value when multiplying values(to see if nodes can read distributed file)
But look this lines, result will never be effected by distributed file:
... fileVal = int(testFile.readline()) ... return [x * 100 for x in iterator]
I'm sure this code was intended as like this:
... fileVal = int(testFile.readline()) ... return [x * fileVal for x in iterator]