[SPARK-29938] Add batching in alter table add partition flow - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.4, 2.4.4
Fix Version/s: 3.0.0
Component/s: SQL
Labels:
None

Description

When lot of new partitions are added by an Insert query on a partitioned datasource table, sometimes the query fails with -

An error was encountered: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out; at
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) at
org.apache.spark.sql.hive.HiveExternalCatalog.createPartitions(HiveExternalCatalog.scala:928) at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createPartitions(SessionCatalog.scala:798) at
org.apache.spark.sql.execution.command.AlterTableAddPartitionCommand.run(ddl.scala:448) at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.refreshUpdatedPartitions$1(InsertIntoHadoopFsRelationCommand.scala:137)

This happens because adding thousands of partition in a single call takes lot of time and the client eventually timesout.

Also adding lot of partitions can lead to OOM in Hive Metastore (similar issue in recover partition flow fixed).

Steps to reproduce -

case class Partition(data: Int, partition_key: Int)
val df = sc.parallelize(1 to 15000, 15000).map(x => Partition(x,x)).toDF
df.registerTempTable("temp_table")

spark.sql("""CREATE TABLE `test_table` (`data` INT, `partition_key` INT) USING parquet PARTITIONED BY (partition_key) """)
spark.sql("INSERT OVERWRITE TABLE test_table select * from temp_table").collect()

Attachments

Issue Links

links to

GitHub Pull Request #26569

GitHub Pull Request #27293

GitHub Pull Request #27413

Activity

People

Assignee:: Prakhar Jain

Reporter:: Prakhar Jain

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 18/Nov/19 06:27

Updated:: 17/Apr/20 00:34

Resolved:: 20/Dec/19 14:54