Details
-
Bug
-
Status: Done
-
Major
-
Resolution: Done
-
None
-
None
Description
This results from https://github.com/apache/metron/pull/505
That PR breaks the standard convention of just choose a file name and rotate the file repeatedly, because now any message could get routed to a different file based on a Stellar statement. This break was noted in the PR, because we didn't care about the rotation number anyway.
This works fine for the 0th rotation (a new file is opened, data is written, file is closed), but on the first rotation we signal to the HdfsWriter that the file has been closed in order to limit the maximum number of open files, but still create a new file with rotation 1. This file never receives any data (because we no longer maintain an open file reference to it), and the SourceHandler for it stays open with the Timer still attempting further (pointless rotations). Note that no data is lost, any data that would go into this file just instead goes into a new 0 rotation file.
This becomes more obvious the longer the cluster is running or the shorter the timeout on a file is. As each open file attempts rotations, eventually large numbers of 0-byte files are created.
An easy fix for this is to remove the creation of new files during rotations (but still perform RotationActions). This means that every file will have a 0 rotation (which we don't actually use for anything anyway). More complicated things could be done (e.g. evict oldest file from a cache), but it seems heavy handed for maintaining a rotation count we don't care about anyway. Additionally, the Timer should be cancelled when the reference is removed from HdfsWriter.
Attachments
Issue Links
- links to