Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21177

df.saveAsTable slows down linearly, with number of appends

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.0
    • None
    • SQL

    Description

      In short, please use the following shell transcript for the reproducer.

      Welcome to
            ____              __
           / __/__  ___ _____/ /__
          _\ \/ _ \/ _ `/ __/  '_/
         /___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
            /_/
               
      Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91)
      Type in expressions to have them evaluated.
      Type :help for more information.
      
      scala> def printTimeTaken(str: String, f: () => Unit) {
          val start = System.nanoTime()
          f()
          val end = System.nanoTime()
          val timetaken = end - start
          import scala.concurrent.duration._
          println(s"Time taken for $str is ${timetaken.nanos.toMillis}\n")
        }
           |      |      |      |      |      |      | printTimeTaken: (str: String, f: () => Unit)Unit
      
      scala> 
      for(i <- 1 to 100000) {printTimeTaken("time to append to hive:", () => { Seq(1, 2).toDF().write.mode("append").saveAsTable("t1"); })}
      Time taken for time to append to hive: is 284
      
      Time taken for time to append to hive: is 211
      
      ...
      ...
      
      Time taken for time to append to hive: is 2615
      
      ...
      Time taken for time to append to hive: is 3055
      ...
      Time taken for time to append to hive: is 22425
      
      ....
      

      Why does it matter ?

      In a streaming job it is not possible to append to hive using this dataframe operation.

      Attachments

        Activity

          People

            Unassigned Unassigned
            prashant Prashant Sharma
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: