[SPARK-36720] On overwrite mode, setting option truncate as true doesn't truncate the table - ASF JIRA

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.1.1
Fix Version/s: None
Component/s: PySpark
Labels:
None

Language:
- Python

Description

I'm using PySpark from AWS Glue job to write it to SAP HANA using jdbc. Our requirement is to truncate and load data in HANA.

I've tried both of these options and on both cases, based on the stack trace, it is trying to drop the table which is not allowed by security design.

#df_lake.write.format("jdbc").option("url", edw_jdbc_url).option("driver", "com.sap.db.jdbc.Driver").option("dbtable", edw_jdbc_db_table).option("user", edw_jdbc_userid).option("password", edw_jdbc_password).option("truncate", "true").mode("append").save()
properties={"user": edw_jdbc_userid, "password": edw_jdbc_password, "truncate":"true"}
df_lake.write.jdbc(url=edw_jdbc_url, table=edw_jdbc_db_table, mode='overwrite', properties=properties)

I've verified that the schema matches. I did the jdbc read and print out the schema as well as printing the schema from the source table.

Schema from HANA:
root

– RTL_ACCT_ID: long (nullable = true)

– FINE_DINING_PROPOSED: string (nullable = true)

– FINE_WINE_PROPOSED: string (nullable = true)

– FINE_WINE_INF_PROPOSED: string (nullable = true)

– GOLD_SILVER_PROPOSED: string (nullable = true)

– PREMIUM_PROPOSED: string (nullable = true)

– GSP_PROPOSED: string (nullable = true)

– PROPOSED_CRAFT: string (nullable = true)

– FW_REASON: string (nullable = true)

– FWI_REASON: string (nullable = true)

– GS_REASON: string (nullable = true)

– PREM_REASON: string (nullable = true)

– FD_REASON: string (nullable = true)

– CRAFT_REASON: string (nullable = true)

– GSP_FLAG: string (nullable = true)

– GSP_REASON: string (nullable = true)

– ELIGIBILITY: string (nullable = true)

– DW_LD_S: timestamp (nullable = true)

Schema from the source table:
root

– RTL_ACCT_ID: long (nullable = true)

– FINE_DINING_PROPOSED: string (nullable = true)

– FINE_WINE_PROPOSED: string (nullable = true)

– FINE_WINE_INF_PROPOSED: string (nullable = true)

– GOLD_SILVER_PROPOSED: string (nullable = true)

– PREMIUM_PROPOSED: string (nullable = true)

– GSP_PROPOSED: string (nullable = true)

– PROPOSED_CRAFT: string (nullable = true)

– FW_REASON: string (nullable = true)

– FWI_REASON: string (nullable = true)

– GS_REASON: string (nullable = true)

– PREM_REASON: string (nullable = true)

– FD_REASON: string (nullable = true)

– CRAFT_REASON: string (nullable = true)

– GSP_FLAG: string (nullable = true)

– GSP_REASON: string (nullable = true)

– ELIGIBILITY: string (nullable = true)

– DW_LD_S: timestamp (nullable = true)

This is the stack trace
py4j.protocol.Py4JJavaError: An error occurred while calling o169.jdbc.
: com.sap.db.jdbc.exceptions.JDBCDriverException: SAP DBTech JDBC: [258]: insufficient privilege: Detailed info for this error can be found with guid 'xxxx'
at com.sap.db.jdbc.exceptions.SQLExceptionSapDB._newInstance(SQLExceptionSapDB.java:191)
at com.sap.db.jdbc.exceptions.SQLExceptionSapDB.newInstance(SQLExceptionSapDB.java:42)
at com.sap.db.jdbc.packet.HReplyPacket._buildExceptionChain(HReplyPacket.java:976)
at com.sap.db.jdbc.packet.HReplyPacket.getSQLExceptionChain(HReplyPacket.java:157)
at com.sap.db.jdbc.packet.HPartInfo.getSQLExceptionChain(HPartInfo.java:39)
at com.sap.db.jdbc.ConnectionSapDB._receive(ConnectionSapDB.java:3476)
at com.sap.db.jdbc.ConnectionSapDB.exchange(ConnectionSapDB.java:1568)
at com.sap.db.jdbc.StatementSapDB._executeDirect(StatementSapDB.java:1435)
at com.sap.db.jdbc.StatementSapDB._execute(StatementSapDB.java:1414)
at com.sap.db.jdbc.StatementSapDB._execute(StatementSapDB.java:1399)
at com.sap.db.jdbc.StatementSapDB._executeUpdate(StatementSapDB.java:1387)
at com.sap.db.jdbc.StatementSapDB.executeUpdate(StatementSapDB.java:175)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.executeStatement(JdbcUtils.scala:993)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.dropTable(JdbcUtils.scala:93)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:61)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:817)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)

On overwrite mode, setting option truncate as true doesn't truncate the table

Details

Description

Attachments

Activity

People

Dates