[SPARK-41666] Support parameterized SQL in PySpark - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.4.0
Fix Version/s: 3.4.0
Component/s: SQL
Labels:
None

Description

Enhance the PySpark SQL API with support for parameterized SQL statements to improve security and reusability. Application developers will be able to write SQL with parameter markers whose values will be passed separately from the SQL code and interpreted as literals. This will help prevent SQL injection attacks for applications that generate SQL based on a user’s selections, which is often done via a user interface.

PySpark has already supported formatting of sqlText using the syntax

{...}

. Need to leave the API the same:

def sql(self, sqlQuery: str, **kwargs: Any) -> DataFrame:

and support new parameters by the same API.

PySpark sql() should passes unused parameters to the JVM side where the Java sql() method handles them. For example:

>>> mydf = spark.range(10)
>>> spark.sql("SELECT id FROM {mydf} WHERE id % @param1 = 0", mydf=mydf, param1='3').show()
+---+
| id|
+---+
|  0|
|  3|
|  6|
|  9|
+---+

Attachments

Issue Links

is a clone of

SPARK-41271 Parameterized SQL

Resolved

links to

[Github] Pull Request #39159 (MaxGekk)

[Github] Pull Request #39183 (MaxGekk)

Activity

People

Assignee:: Max Gekk

Reporter:: Max Gekk

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 21/Dec/22 13:55

Updated:: 23/Dec/22 09:30

Resolved:: 23/Dec/22 09:30