Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3119

Re-implement TorrentBroadcast

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.1.0
    • Spark Core
    • None

    Description

      TorrentBroadcast is unnecessarily complicated:

      1. It tracks a lot of mutable states, such as total number of bytes, number of blocks fetched.
      2. It has at least two data structures that are not needed: TorrentInfo and TorrentBlock.
      3. It uses getSingle on executors to get the block instead of getLocal, resulting in an extra roundtrip to look up the location of the block when the block doesn't exist yet.
      4. It has a metadata block that is completely unnecessary.
      5. It does an extra memory copy during deserialization to copy all the blocks into a single giant array.

      Attachments

        Issue Links

          Activity

            People

              rxin Reynold Xin
              rxin Reynold Xin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: