Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-27325

Iceberg: Expiring old snapshots deletes files with DirectExecutorService causing runtime delays

    XMLWordPrintableJSON

Details

    Description

      Expiring old snapshots takes a lot of time, as fileCleanupStrategy internally uses directExecutorService. Creating this as a placeholder ticket to fix the same. If fixed in iceberg, need to upgrade the lib here.

      insert into store_sales_delete_9 select *, current_timestamp() as ts from tpcds_1000_update.ssv ;;
      
      ALTER TABLE store_sales_delete_9 EXECUTE expire_snapshots('2023-05-09 00:00:00');
      
      
      
      	at org.apache.iceberg.relocated.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:36)
      	at org.apache.iceberg.util.Tasks$Builder.runParallel(Tasks.java:300)
      	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:194)
      	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:189)
      	at org.apache.iceberg.FileCleanupStrategy.deleteFiles(FileCleanupStrategy.java:84)
      	at org.apache.iceberg.IncrementalFileCleanup.cleanFiles(IncrementalFileCleanup.java:262)
      	at org.apache.iceberg.RemoveSnapshots.cleanExpiredSnapshots(RemoveSnapshots.java:338)
      	at org.apache.iceberg.RemoveSnapshots.commit(RemoveSnapshots.java:312)
      	at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.executeOperation(HiveIcebergStorageHandler.java:560)
      	at org.apache.hadoop.hive.ql.metadata.Hive.alterTableExecuteOperation(Hive.java:6844)
      	at org.apache.hadoop.hive.ql.ddl.table.execute.AlterTableExecuteOperation.execute(AlterTableExecuteOperation.java:37)
      	at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84)
      	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
      	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
      	at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:360)
      	at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:333)
      	at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:250)
      	at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:111)
      	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:809)
      	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:547)
      	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:541)
      	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
      	at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
      	at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
      	at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
      	at java.security.AccessController.doPrivileged(java.base@11.0.19/Native Method)
      	at javax.security.auth.Subject.doAs(java.base@11.0.19/Subject.java:423)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
      	at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360)
      	at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.19/Executors.java:515)
      	at java.util.concurrent.FutureTask.run(java.base@11.0.19/FutureTask.java:264)
      	at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.19/Executors.java:515)
      

      Attachments

        Issue Links

          Activity

            People

              ayushtkn Ayush Saxena
              rajesh.balamohan Rajesh Balamohan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h