Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
3.0.0
Description
Currently, HiveProtoLoggingHook write event data to hdfs. The number of files can grow to very large numbers.
Since the files are created under a folder with Date being a part of the path, hive should have a way to clean up data older than a certain configured time / date. This can be a job that can run with as little frequency as just once a day.
This time should be set to 1 week default. There should also be a sane upper bound of # of files so that when a large cluster generates a lot of files during a spike, we don't force the cluster fall over.
Attachments
Attachments
Issue Links
- links to