Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3035

Wrong example with SparkContext.addFile

    XMLWordPrintableJSON

    Details

    • Type: Documentation
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: 1.0.2
    • Fix Version/s: 1.1.0
    • Component/s: PySpark
    • Labels:
    • Target Version/s:

      Description

      "context.py"
      def addFile(self, path):
          """
          ...
          >>> from pyspark import SparkFiles
          >>> path = os.path.join(tempdir, "test.txt")
          >>> with open(path, "w") as testFile:
          ...    testFile.write("100")
          >>> sc.addFile(path)
          >>> def func(iterator):
          ...    with open(SparkFiles.get("test.txt")) as testFile:
          ...        fileVal = int(testFile.readline())
          ...        return [x * 100 for x in iterator]
          >>> sc.parallelize([1, 2, 3, 4]).mapPartitions(func).collect()
          [100, 200, 300, 400]
          """
      

      This is example that write 100 to temp file and distribute it and use it's value when multiplying values(to see if nodes can read distributed file)

      But look this lines, result will never be effected by distributed file:

          ...        fileVal = int(testFile.readline())
          ...        return [x * 100 for x in iterator]
      

      I'm sure this code was intended as like this:

          ...        fileVal = int(testFile.readline())
          ...        return [x * fileVal for x in iterator]
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              iAmGhost Daehan Kim
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: