Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25277

Slow Hive partition deletion for Cloud object stores with expensive ListFiles

    XMLWordPrintableJSON

    Details

      Description

      Deleting a Hive partition is slow when use a Cloud object store as the warehouse for which ListFiles is expensive. A root cause is that the recursive parent dir deletion is very inefficient: there are many duplicated calls to isEmpty (ListFiles is called at the end). This fix sorts the parents to delete according to the path size, and always processes the longest one (e.g., a/b/c is always before a/b). As a result, each parent path is only needed to be checked once.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                coufon Zhou Fang
                Reporter:
                coufon Zhou Fang
              • Votes:
                1 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4.5h
                  4.5h