Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29938

Add batching in alter table add partition flow

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.4, 2.4.4
    • 3.0.0
    • SQL
    • None

    Description

      When lot of new partitions are added by an Insert query on a partitioned datasource table, sometimes the query fails with -

      An error was encountered: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out; at
      org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) at
      org.apache.spark.sql.hive.HiveExternalCatalog.createPartitions(HiveExternalCatalog.scala:928) at
      org.apache.spark.sql.catalyst.catalog.SessionCatalog.createPartitions(SessionCatalog.scala:798) at
      org.apache.spark.sql.execution.command.AlterTableAddPartitionCommand.run(ddl.scala:448) at
      org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.refreshUpdatedPartitions$1(InsertIntoHadoopFsRelationCommand.scala:137)
      

      This happens because adding thousands of partition in a single call takes lot of time and the client eventually timesout.

      Also adding lot of partitions can lead to OOM in Hive Metastore (similar issue in recover partition flow fixed).

      Steps to reproduce -

      case class Partition(data: Int, partition_key: Int)
      val df = sc.parallelize(1 to 15000, 15000).map(x => Partition(x,x)).toDF
      df.registerTempTable("temp_table")
      
      spark.sql("""CREATE TABLE `test_table` (`data` INT, `partition_key` INT) USING parquet PARTITIONED BY (partition_key) """)
      spark.sql("INSERT OVERWRITE TABLE test_table select * from temp_table").collect()
      

      Attachments

        Activity

          People

            prakharjain09 Prakhar Jain
            prakharjain09 Prakhar Jain
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: