Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25535

Control cleaning obsolete directories/files of a table via property

    XMLWordPrintableJSON

Details

    Description

      Use Case -

      When external tool like SPARK_ACID try to access hive metastore directly instead of accessing LLAP or hs2 which lacks the ability of take acquires locks on the metastore artefacts. Due to which if any spark acid jobs starts and at the same time compaction happens in hive with leads to exceptions like FileNotFound for delta directory because at time of spark acid compilation phase delta files are present but when execution start delta files are deleted by compactor.

      Inorder to tackle problem like this I am proposing to add a config "NO_CLEANUP" is table properties and partition properties which provide higher control on table and partition compaction process.

      We already have "HIVE_COMPACTOR_DELAYED_CLEANUP_ENABLED" which allow us to delay the deletion of "obsolete directories/files" but it is applicable to all the table in metastore where this config will provide table and partition level control.

      Solution -

      Add "NO_CLEANUP" in the table properties enable/disable the table-level and partition cleanup and prevent the cleaner process from automatically cleaning obsolete directories/files.

      Example -

      ALTER TABLE <tablename> SET TBLPROPERTIES('NO_CLEANUP'=FALSE/TRUE);

      Attachments

        Issue Links

          Activity

            People

              ashish-kumar-sharma Ashish Sharma
              ashish-kumar-sharma Ashish Sharma
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2.5h
                  2.5h