Affects Version/s: None
Fix Version/s: None
Currently, the .tmp files left after a system or Flume client crash are never cleaned up, and several users have noted that it would be better if Flume itself took care of this.
This is actually a complicated issue, with multiple facets. These include:
- We would need to persist the in-progress filenames somewhere, probably on the agent's local FS. This is not very hard.
- At startup, we would need to handle the files in some way to guarantee at least one of the following:
- Mark it as a potentially partial file somehow when renaming from .tmp
- Ensure that the file format is valid before renaming it from .tmp
- This 2nd option is actually harder than it sounds, since arbitrary serializers may be plugged in. Say it's an XML serializer, then we would need some way to programmatically read (deserialize) the file, throw away any potentially unfinished records at the end (this is OK since the transaction must not have been committed), then re-serialize the file with all the valid records and correct opening/closing tags.
- General deserialization / recovery APIs would need to be added to support this, and this would need to be very carefully designed and implemented in order to work. In the end, it also seems likely that if this a complex thing (sounds complex) then most people would rely on out-of-the-box implementations (supported file formats) to get this functionality, unless they are building on top of abstract classes (e.g. for XML schema handling) to help accomplish this.