Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19912

String literals are not escaped while performing Hive metastore level partition pruning

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.1, 2.2.0
    • 2.1.1, 2.2.0
    • SQL

    Description

      Shim_v0_13.convertFilters() doesn't escape string literals while generating Hive style partition predicates.

      The following SQL-injection-like test case illustrates this issue:

        test("SPARK-19912") {
          withTable("spark_19912") {
            Seq(
              (1, "p1", "q1"),
              (2, "p1\" and q=\"q1", "q2")
            ).toDF("a", "p", "q").write.partitionBy("p", "q").saveAsTable("spark_19912")
      
            checkAnswer(
              spark.table("foo").filter($"p" === "p1\" and q = \"q1").select($"a"),
              Row(2)
            )
          }
        }
      

      The above test case fails like this:

      [info] - spark_19912 *** FAILED *** (13 seconds, 74 milliseconds)
      [info]   Results do not match for query:
      [info]   Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
      [info]   Timezone Env:
      [info]
      [info]   == Parsed Logical Plan ==
      [info]   'Project [unresolvedalias('a, None)]
      [info]   +- Filter (p#27 = p1" and q = "q1)
      [info]      +- SubqueryAlias spark_19912
      [info]         +- Relation[a#26,p#27,q#28] parquet
      [info]
      [info]   == Analyzed Logical Plan ==
      [info]   a: int
      [info]   Project [a#26]
      [info]   +- Filter (p#27 = p1" and q = "q1)
      [info]      +- SubqueryAlias spark_19912
      [info]         +- Relation[a#26,p#27,q#28] parquet
      [info]
      [info]   == Optimized Logical Plan ==
      [info]   Project [a#26]
      [info]   +- Filter (isnotnull(p#27) && (p#27 = p1" and q = "q1))
      [info]      +- Relation[a#26,p#27,q#28] parquet
      [info]
      [info]   == Physical Plan ==
      [info]   *Project [a#26]
      [info]   +- *FileScan parquet default.spark_19912[a#26,p#27,q#28] Batched: true, Format: Parquet, Location: PrunedInMemoryFileIndex[], PartitionCount: 0, PartitionFilters: [isnotnull(p#27), (p#27 = p1" and q = "q1)], PushedFilters: [], ReadSchema: struct<a:int>
      [info]   == Results ==
      [info]
      [info]   == Results ==
      [info]   !== Correct Answer - 1 ==   == Spark Answer - 0 ==
      [info]    struct<>                   struct<>
      [info]   ![2]
      

      Attachments

        Issue Links

          Activity

            People

              dongjoon Dongjoon Hyun
              lian cheng Cheng Lian
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: