Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12691

Multiple unionAll on Dataframe goes growingly slow.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.3.0, 1.3.1, 1.4.0, 1.4.1
    • None
    • SQL
    • None
    • Tested in Spark 1.3 and 1.4.

    Description

      Multiple unionAll on Dataframe seems to somehow cause repeated calculations. Here is the sample code to reproduce this issue.

      val dfs = for (i<-0 to 100) yield {
      val df = sc.parallelize((0 to 10).zipWithIndex).toDF("A", "B")
      df
      }

      var i = 1
      val s1 = System.currentTimeMillis()
      dfs.reduce{(a,b)=>

      { val t1 = System.currentTimeMillis() val dd = a unionAll b val t2 = System.currentTimeMillis() println("Round " + i + " unionAll took " + (t2 - t1) + " ms") i = i + 1 dd }

      }
      val s2 = System.currentTimeMillis()
      println((i - 1) + " unionAll took totally " + (s2 - s1) + " ms")

      And it printed as follows. And as you can see, it looks like each unionAll seems to redo all the previous unionAll and therefore took self time plus all previous time, which, makes each unionAll go slower and slower in a growing manner.

      BTW, this behaviour doesn't happen if I directly union all the RDDs in Dataframes.

      ----- output start ----
      Round 1 unionAll took 1 ms
      Round 2 unionAll took 1 ms
      Round 3 unionAll took 1 ms
      Round 4 unionAll took 1 ms
      Round 5 unionAll took 1 ms
      Round 6 unionAll took 1 ms
      Round 7 unionAll took 1 ms
      Round 8 unionAll took 2 ms
      Round 9 unionAll took 2 ms
      Round 10 unionAll took 2 ms
      Round 11 unionAll took 3 ms
      Round 12 unionAll took 3 ms
      Round 13 unionAll took 3 ms
      Round 14 unionAll took 3 ms
      Round 15 unionAll took 3 ms
      Round 16 unionAll took 4 ms
      Round 17 unionAll took 4 ms
      Round 18 unionAll took 4 ms
      Round 19 unionAll took 4 ms
      Round 20 unionAll took 4 ms
      Round 21 unionAll took 5 ms
      Round 22 unionAll took 5 ms
      Round 23 unionAll took 5 ms
      Round 24 unionAll took 5 ms
      Round 25 unionAll took 5 ms
      Round 26 unionAll took 6 ms
      Round 27 unionAll took 6 ms
      Round 28 unionAll took 6 ms
      Round 29 unionAll took 6 ms
      Round 30 unionAll took 6 ms
      Round 31 unionAll took 6 ms
      Round 32 unionAll took 7 ms
      Round 33 unionAll took 7 ms
      Round 34 unionAll took 7 ms
      Round 35 unionAll took 7 ms
      Round 36 unionAll took 7 ms
      Round 37 unionAll took 8 ms
      Round 38 unionAll took 8 ms
      Round 39 unionAll took 8 ms
      Round 40 unionAll took 8 ms
      Round 41 unionAll took 9 ms
      Round 42 unionAll took 9 ms
      Round 43 unionAll took 9 ms
      Round 44 unionAll took 9 ms
      Round 45 unionAll took 9 ms
      Round 46 unionAll took 9 ms
      Round 47 unionAll took 9 ms
      Round 48 unionAll took 9 ms
      Round 49 unionAll took 10 ms
      Round 50 unionAll took 10 ms
      Round 51 unionAll took 10 ms
      Round 52 unionAll took 10 ms
      Round 53 unionAll took 11 ms
      Round 54 unionAll took 11 ms
      Round 55 unionAll took 11 ms
      Round 56 unionAll took 12 ms
      Round 57 unionAll took 12 ms
      Round 58 unionAll took 12 ms
      Round 59 unionAll took 12 ms
      Round 60 unionAll took 12 ms
      Round 61 unionAll took 12 ms
      Round 62 unionAll took 13 ms
      Round 63 unionAll took 13 ms
      Round 64 unionAll took 13 ms
      Round 65 unionAll took 13 ms
      Round 66 unionAll took 14 ms
      Round 67 unionAll took 14 ms
      Round 68 unionAll took 14 ms
      Round 69 unionAll took 14 ms
      Round 70 unionAll took 14 ms
      Round 71 unionAll took 14 ms
      Round 72 unionAll took 14 ms
      Round 73 unionAll took 14 ms
      Round 74 unionAll took 15 ms
      Round 75 unionAll took 15 ms
      Round 76 unionAll took 15 ms
      Round 77 unionAll took 15 ms
      Round 78 unionAll took 16 ms
      Round 79 unionAll took 16 ms
      Round 80 unionAll took 16 ms
      Round 81 unionAll took 16 ms
      Round 82 unionAll took 17 ms
      Round 83 unionAll took 17 ms
      Round 84 unionAll took 17 ms
      Round 85 unionAll took 17 ms
      Round 86 unionAll took 17 ms
      Round 87 unionAll took 18 ms
      Round 88 unionAll took 17 ms
      Round 89 unionAll took 18 ms
      Round 90 unionAll took 18 ms
      Round 91 unionAll took 18 ms
      Round 92 unionAll took 18 ms
      Round 93 unionAll took 18 ms
      Round 94 unionAll took 19 ms
      Round 95 unionAll took 19 ms
      Round 96 unionAll took 20 ms
      Round 97 unionAll took 20 ms
      Round 98 unionAll took 20 ms
      Round 99 unionAll took 20 ms
      Round 100 unionAll took 20 ms
      100 unionAll took totally 1337 ms

      ----- output end ----

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              lliang Allen Liang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: