Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24288

Enable preventing predicate pushdown

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.4.0
    • SQL
    • None

    Description

      Issue discussed on Mailing List: http://apache-spark-developers-list.1001551.n3.nabble.com/Preventing-predicate-pushdown-td23976.html

      While working with JDBC datasource I saw that many "or" clauses with
      non-equality operators causes huge performance degradation of SQL query
      to database (DB2). For example:

      val df = spark.read.format("jdbc").(other options to parallelize
      load).load()

      df.where(s"(date1 > $param1 and (date1 < $param2 or date1 is null) or x
       > 100)").show() // in real application whose predicates were pushed
      many many lines below, many ANDs and ORs

      If I use cache() before where, there is no predicate pushdown of this
      "where" clause. However, in production system caching many sources is a
      waste of memory (especially is pipeline is long and I must do cache many
      times).There are also few more workarounds, but it would be great if Spark will support preventing predicate pushdown by user.

       

      For example: df.withAnalysisBarrier().where(...) ?

       

      Note, that this should not be a global configuration option. If I read 2 DataFrames, df1 and df2, I would like to specify that df1 should not have some predicates pushed down, but some may be, but df2 should have all predicates pushed down, even if target query joins df1 and df2. As far as I understand Spark optimizer, if we use functions like `withAnalysisBarrier` and put AnalysisBarrier explicitly in logical plan, then predicates won't be pushed down on this particular DataFrames and PP will be still possible on the second one.

      Update: The solution is to add a JDBC Option "pushDownPredicate" (default true) to allow/disallow predicate push-down in JDBC data source.

      Attachments

        Activity

          People

            maryannxue Wei Xue
            TomaszGaweda Tomasz Gawęda
            Votes:
            2 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: