Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18593

JDBCRDD returns incorrect results for filters on CHAR of PostgreSQL

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.6.2, 1.6.3
    • Fix Version/s: 2.0.0
    • Component/s: SQL
    • Labels:

      Description

      In Apache Spark 1.6.x, JDBCRDD returns incorrect results for a query with filters on CHAR column with PostgreSQL CHAR type. The root cause is PostgreSQL returns `space padded string` for a result. So, the post processing filter `Filter (a#0 = A)` is evaluated false. Spark 2.0.0 removes the post filter because it is already handled in the database by `PushedFilters: [EqualTo(a,A)]`.

      scala> val t_char = sqlContext.read.option("user", "postgres").option("password", "rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_char", new java.util.Properties())
      t_char: org.apache.spark.sql.DataFrame = [a: string]
      
      scala> val t_varchar = sqlContext.read.option("user", "postgres").option("password", "rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_varchar", new java.util.Properties())
      t_varchar: org.apache.spark.sql.DataFrame = [a: string]
      
      scala> t_char.show
      +----------+
      |         a|
      +----------+
      |A         |
      |AA        |
      |AAA       |
      +----------+
      
      
      scala> t_varchar.show
      +---+
      |  a|
      +---+
      |  A|
      | AA|
      |AAA|
      +---+
      
      
      scala> t_char.filter(t_char("a")==="A").show
      +---+
      |  a|
      +---+
      +---+
      
      
      scala> t_char.filter(t_char("a")==="A         ").show
      +----------+
      |         a|
      +----------+
      |A         |
      +----------+
      
      
      scala> t_varchar.filter(t_varchar("a")==="A").show
      +---+
      |  a|
      +---+
      |  A|
      +---+
      
      
      scala> t_char.filter(t_char("a")==="A").explain
      == Physical Plan ==
      Filter (a#0 = A)
      +- Scan JDBCRelation(jdbc:postgresql://localhost:5432/postgres,t_char,[Lorg.apache.spark.Partition;@2f65c341,{user=postgres, password=rootpass})[a#0] PushedFilters: [EqualTo(a,A)]
      

        Attachments

          Activity

            People

            • Assignee:
              maropu Takeshi Yamamuro
              Reporter:
              DurgaPrasad16 Durga Prasad Gunturu
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: