Flume
  1. Flume
  2. FLUME-253

Specifying the size of WAL directory where flume node agent(s) write(s) its own log.

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: v0.9.4
    • Fix Version/s: v0.9.5
    • Component/s: Node
    • Labels:
      None

      Description

      Each flume node is generating Write-Ahead-Log (WAL) in directory specified by "flume.agent.logdir"

      Lets say a flume node is running on some production sever and collecting logs generated by the production server to some collector node running on some other server. Then one would like to specify the size of WAL directory. Because no one would like their production server to run out of space when some collector node is down.

      Also if one doesn't want to lose his/her log data there should be option for that choice too.

      So Requirements can be spelled as follows:
      1)One should be able to specify the size of WAL directory and there should be option of "unlimited" size too. This "unlimited" will be helpful for those people who don't want to lose their logs at any cost.

      2)If size is specified WAL directory should implement some algorithm regarding
      a)Whether one wants the all latest log to be store in the WAL directory
      b)OR one wants to stop writing to WAL directory as soon it reaches the size limit

        Activity

        Arvind Prabhakar made changes -
        Affects Version/s v0.9.4 [ 12317557 ]
        Mark Thomas made changes -
        Project Import Tue Aug 02 16:57:12 UTC 2011 [ 1312304232406 ]
        Jonathan Hsieh made changes -
        Fix Version/s v0.9.4 [ 10050 ]
        Fix Version/s v0.9.5 [ 10090 ]
        Hide
        E. Sammer added a comment -

        I think it's worth noting that if someone is using the WAL (i.e. E2E reliability mode) they're saying they don't want to lose data. If this is true, if the WAL fills, you no longer want to handle requests (i.e. generate logging data) and probably want to block the logger telling the clients causing the logged data to go to another machine. Of course, having a collector failover chain also reduces the possibility of filling a disk. To be clear, it's not that I don't think we shouldn't have a way to limit the WAL's growth, just that it probably doesn't make sense to stop writing to the WAL but continue to handle requests (or generate logs) - it basically means you can't guarantee delivery anymore.

        Show
        E. Sammer added a comment - I think it's worth noting that if someone is using the WAL (i.e. E2E reliability mode) they're saying they don't want to lose data. If this is true, if the WAL fills, you no longer want to handle requests (i.e. generate logging data) and probably want to block the logger telling the clients causing the logged data to go to another machine. Of course, having a collector failover chain also reduces the possibility of filling a disk. To be clear, it's not that I don't think we shouldn't have a way to limit the WAL's growth, just that it probably doesn't make sense to stop writing to the WAL but continue to handle requests (or generate logs) - it basically means you can't guarantee delivery anymore.
        Jonathan Hsieh made changes -
        Fix Version/s v0.9.4 [ 10050 ]
        Fix Version/s v0.9.3 [ 10040 ]
        Affects Version/s v0.9.4 [ 10050 ]
        Jonathan Hsieh made changes -
        Affects Version/s v0.9.4 [ 10050 ]
        Affects Version/s v0.9.1 [ 10013 ]
        Affects Version/s v0.9.0 [ 10014 ]
        Affects Version/s v0.9.2 [ 10022 ]
        Hide
        Jonathan Hsieh added a comment -

        Another way to support the goal of this is to do something similar to hadoop's hdfs's dfs.datanode.du.reserved parameter – if the volume that flume's wal is writing has less than the specified number of bytes, use a specified wal data policy.

        Show
        Jonathan Hsieh added a comment - Another way to support the goal of this is to do something similar to hadoop's hdfs's dfs.datanode.du.reserved parameter – if the volume that flume's wal is writing has less than the specified number of bytes, use a specified wal data policy.
        Jonathan Hsieh made changes -
        Field Original Value New Value
        Fix Version/s v0.9.3 [ 10040 ]
        Hide
        Jonathan Hsieh added a comment -

        Guru,

        This is a great idea, something we do not currently support, and is something we would really like to support. I've marked up to critical priority.

        Jon.

        Show
        Jonathan Hsieh added a comment - Guru, This is a great idea, something we do not currently support, and is something we would really like to support. I've marked up to critical priority. Jon.
        Disabled imported user created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Disabled imported user
          • Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development