Flume
  1. Flume
  2. FLUME-273

RollSink should expose ability to configure SizeTrigger

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: v0.9.1
    • Fix Version/s: v0.9.5
    • Component/s: None
    • Labels:
      None
    • Environment:

      Red Hat Linux

      Description

      Currently the user can only configure a roll sink to based on a time trigger, not on a size trigger. Size trigger would be much more useful since our servers have high load period and very light load periods. Using a time trigger during the light load periods would end up creating lots of small files.

        Issue Links

          Activity

          Hide
          flume_amp added a comment -

          This enhancement would also be useful to minimize the wasted space when writing to HDFS to ensure that the resulting files are at or just under the HDFS block size. For our usage it might actually be optimal to have a size trigger with an optional time trigger in addition (ie roll over when the size trigger is hit or after it fails to be hit after the configured time).

          Show
          flume_amp added a comment - This enhancement would also be useful to minimize the wasted space when writing to HDFS to ensure that the resulting files are at or just under the HDFS block size. For our usage it might actually be optimal to have a size trigger with an optional time trigger in addition (ie roll over when the size trigger is hit or after it fails to be hit after the configured time).
          Hide
          Jonathan Hsieh added a comment -

          up priority because it is has been requested at least 3 times on the user list now.

          Show
          Jonathan Hsieh added a comment - up priority because it is has been requested at least 3 times on the user list now.
          Hide
          Disabled imported user added a comment -

          Yay!

          FYI, seems to be a dupe of https://issues.cloudera.org/browse/FLUME-541.

          Show
          Disabled imported user added a comment - Yay! FYI, seems to be a dupe of https://issues.cloudera.org/browse/FLUME-541 .
          Hide
          E. Sammer added a comment - - edited

          This enhancement would also be useful to minimize the wasted space when writing to HDFS to ensure that the resulting files are at or just under the HDFS block size.

          Just fyi, there is no (block) space waste if data is below a block size. HDFS does not burn the entire block size on the DN if the block data doesn't occupy the complete block. In other words, a 4K file in HDFS contains a 4K block and occupies 4K on disk and a 65MB file is made up of a "full" 64MB block and a 1MB block, and occupies 65MB on disk.

          Maybe you're referring to space waste in the NN memory? If that's the case, that's the same as what the description means by "during the light load periods would end up creating lots of small files." This is a degenerate situation and should be avoided.

          For our usage it might actually be optimal to have a size trigger with an optional time trigger in addition (ie roll over when the size trigger is hit or after it fails to be hit after the configured time).

          This is usually what people really want when they think about it. +1.

          Show
          E. Sammer added a comment - - edited This enhancement would also be useful to minimize the wasted space when writing to HDFS to ensure that the resulting files are at or just under the HDFS block size. Just fyi, there is no (block) space waste if data is below a block size. HDFS does not burn the entire block size on the DN if the block data doesn't occupy the complete block. In other words, a 4K file in HDFS contains a 4K block and occupies 4K on disk and a 65MB file is made up of a "full" 64MB block and a 1MB block, and occupies 65MB on disk. Maybe you're referring to space waste in the NN memory? If that's the case, that's the same as what the description means by "during the light load periods would end up creating lots of small files." This is a degenerate situation and should be avoided. For our usage it might actually be optimal to have a size trigger with an optional time trigger in addition (ie roll over when the size trigger is hit or after it fails to be hit after the configured time). This is usually what people really want when they think about it. +1.
          Hide
          Jonathan Hsieh added a comment -
          Show
          Jonathan Hsieh added a comment - review here https://review.cloudera.org/r/1833/
          Hide
          Jonathan Hsieh added a comment -

          committed

          Show
          Jonathan Hsieh added a comment - committed

            People

            • Assignee:
              Jonathan Hsieh
              Reporter:
              Disabled imported user
            • Votes:
              8 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development