Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6307

Executers fetches the same rdd-block 100's or 1000's of times

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 1.2.0
    • None
    • Block Manager, Spark Core
    • None
    • Linux, Spark Standalone 1.2, running in a PBS grid engine

    Description

      The block manager keept fetching the same blocks over and over, making tasks with network activity extremely slow. Two identical tasks can take between 12 seconds up to more than an hour. (where I stopped it).

      Spark should cache the blocks, so it does not fetch the same blocks over, and over, and over.

      Here is a simplified version of the code that provokes it:

      // Read a few thousand lines (~ 15 MB)
      val fileContents = sc.newAPIHadoopFile(path, ......).repartition(16)
      val data = fileContents.map{x => parseContent(x)}.cache()
      // Do a pairwise comparison and count the best pairs
      val pairs = data.cartesian(data).filter { case ((x,y) =>
        similarity(x, y) > 0.9
      }
      pairs.count()
      

      This is a tiny fraction of one of the worker's stderr:

      15/03/12 21:55:09 INFO BlockManager: Found block rdd_8_2 remotely
      15/03/12 21:55:09 INFO BlockManager: Found block rdd_8_2 remotely
      15/03/12 21:55:09 INFO BlockManager: Found block rdd_8_1 remotely
      15/03/12 21:55:09 INFO BlockManager: Found block rdd_8_0 remotely
      
      Thousands more lines, fetching the same 16 remote blocks
      
      15/03/12 22:25:44 INFO BlockManager: Found block rdd_8_0 remotely
      15/03/12 22:25:45 INFO BlockManager: Found block rdd_8_0 remotely
      15/03/12 22:25:45 INFO BlockManager: Found block rdd_8_0 remotely
      15/03/12 22:25:45 INFO BlockManager: Found block rdd_8_0 remotely
      15/03/12 22:25:45 INFO BlockManager: Found block rdd_8_0 remotely
      

      Details for that stage from the UI.

      • Total task time across all tasks: 11.9 h
      • Input: 2.2 GB
      • Shuffle read: 4.5 MB

      Summary Metrics for 176 Completed Tasks

      Metric Min 25th percentile Median 75th percentile Max
      Duration 7 s 8 s 8 s 12 s 59 min
      GC Time 0 ms 99 ms 0.1 s 0.2 s 0.5 s
      Input 6.9 MB 8.2 MB 8.4 MB 9.0 MB 11.0 MB
      Shuffle Read (Remote) 0.0 B 0.0 B 0.0 B 0.0 B 676.6 KB

      Aggregated Metrics by Executor

      Executor ID Address Task Time Total Tasks Failed Tasks Succeeded Tasks Input Output Shuffle Read Shuffle Write Shuffle Spill (Memory) Shuffle Spill (Disk)
      0 n-62-23-3:49566 5.7 h 9 0 9 171.0 MB 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B
      1 n-62-23-6:57518 16.4 h 20 0 20 169.9 MB 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B
      2 n-62-18-48:33551 0 ms 0 0 0 169.6 MB 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B
      3 n-62-23-5:58421 2.9 min 12 0 12 266.2 MB 0.0 B 4.5 MB 0.0 B 0.0 B 0.0 B
      4 n-62-23-1:40096 23 min 164 0 164 1430.4 MB 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B

      Tasks

      Index ID Attempt Status Locality Level Executor ID / Host Launch Time Duration GC Time Input Shuffle Read Errors
      1 2 0 SUCCESS ANY 3 / n-62-23-5 2015/03/12 21:55:00 12 s 0.1 s 6.9 MB (memory) 676.6 KB  
      0 1 0 SUCCESS ANY 0 / n-62-23-3 2015/03/12 21:55:00 39 min 0.3 s 8.7 MB (network) 0.0 B  
      4 5 0 SUCCESS ANY 1 / n-62-23-6 2015/03/12 21:55:00 38 min 0.4 s 8.6 MB (network) 0.0 B  
      3 4 0 RUNNING ANY 2 / n-62-18-48 2015/03/12 21:55:00 55 min   8.3 MB (network) 0.0 B  
      2 3 0 SUCCESS ANY 4 / n-62-23-1 2015/03/12 21:55:00 11 s 0.3 s 8.4 MB (memory) 0.0 B  
      7 8 0 SUCCESS ANY 4 / n-62-23-1 2015/03/12 21:55:00 12 s 0.3 s 9.2 MB (memory) 0.0 B  
      6 7 0 SUCCESS ANY 3 / n-62-23-5 2015/03/12 21:55:00 12 s 0.1 s 8.1 MB (memory) 0.0 B  
      5 6 0 SUCCESS ANY 0 / n-62-23-3 2015/03/12 21:55:00 39 min 0.3 s 8.6 MB (network) 0.0 B  
      9 10 0 RUNNING ANY 1 / n-62-23-6 2015/03/12 21:55:00 55 min   8.7 MB (network) 0.0 B  

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tbertelsen Tobias Bertelsen
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: