Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20025

Clean-up of event files created by HiveProtoLoggingHook.

    XMLWordPrintableJSON

Details

    Description

      Currently, HiveProtoLoggingHook write event data to hdfs. The number of files can grow to very large numbers.

      Since the files are created under a folder with Date being a part of the path, hive should have a way to clean up data older than a certain configured time / date. This can be a job that can run with as little frequency as just once a day.

      This time should be set to 1 week default. There should also be a sane upper bound of # of files so that when a large cluster generates a lot of files during a spike, we don't force the cluster fall over.

      Attachments

        1. HIVE-20025.01.patch
          18 kB
          Sankar Hariappan
        2. HIVE-20025.02.patch
          19 kB
          Sankar Hariappan
        3. HIVE-20025.03.patch
          18 kB
          Sankar Hariappan
        4. HIVE-20025.04.patch
          18 kB
          Sankar Hariappan
        5. HIVE-20025.01-branch-3.patch
          18 kB
          Sankar Hariappan

        Issue Links

          Activity

            People

              sankarh Sankar Hariappan
              sankarh Sankar Hariappan
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: