Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25306

Avoid skewed filter trees to speed up `createFilter` in ORC

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.6.3, 2.0.2, 2.1.3, 2.2.2, 2.3.1, 2.4.0
    • Fix Version/s: 2.4.0
    • Component/s: SQL
    • Labels:
      None

      Description

      In both ORC data sources, createFilter function has exponential time complexity due to its skewed filter tree generation. This PR aims to improve it by using new buildTree function.

      REPRODUCE

      // Create and read 1 row table with 1000 columns
      sql("set spark.sql.orc.filterPushdown=true")
      val selectExpr = (1 to 1000).map(i => s"id c$i")
      spark.range(1).selectExpr(selectExpr: _*).write.mode("overwrite").orc("/tmp/orc")
      print(s"With 0 filters, ")
      spark.time(spark.read.orc("/tmp/orc").count)
      
      // Increase the number of filters
      (20 to 30).foreach { width =>
        val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ")
        print(s"With $width filters, ")
        spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count)
      }
      

      RESULT

      With 0 filters, Time taken: 653 ms                                              
      With 20 filters, Time taken: 962 ms
      With 21 filters, Time taken: 1282 ms
      With 22 filters, Time taken: 1982 ms
      With 23 filters, Time taken: 3855 ms
      With 24 filters, Time taken: 6719 ms
      With 25 filters, Time taken: 12669 ms
      With 26 filters, Time taken: 25032 ms
      With 27 filters, Time taken: 49585 ms
      With 28 filters, Time taken: 98980 ms    // over 1 min 38 seconds
      With 29 filters, Time taken: 198368 ms   // over 3 mins
      With 30 filters, Time taken: 393744 ms   // over 6 mins
      

        Attachments

          Activity

            People

            • Assignee:
              dongjoon Dongjoon Hyun
              Reporter:
              dongjoon Dongjoon Hyun
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: