Details

    1. FLUME-761.patch.5
      49 kB
      Prasad Mujumdar
    2. FLUME-761.patch.4
      30 kB
      Prasad Mujumdar
    3. FLUME-761.patch.4
      30 kB
      Prasad Mujumdar
    4. FLUME-761.patch.2
      12 kB
      Prasad Mujumdar
    5. FLUME-761.patch.1
      8 kB
      Prasad Mujumdar

      Issue Links

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open In Progress In Progress
        2d 4h 19m 1 Prasad Mujumdar 09/Sep/11 20:34
        In Progress In Progress Patch Available Patch Available
        18d 5h 6m 1 Prasad Mujumdar 28/Sep/11 01:40
        Patch Available Patch Available Resolved Resolved
        7d 23h 18m 1 E. Sammer 06/Oct/11 00:58
        Resolved Resolved Closed Closed
        18s 1 E. Sammer 06/Oct/11 00:59
        Sharad Agarwal made changes -
        Link This issue relates to FLUME-855 [ FLUME-855 ]
        Sharad Agarwal made changes -
        Link This issue is blocked by FLUME-855 [ FLUME-855 ]
        Sharad Agarwal made changes -
        Link This issue is blocked by FLUME-855 [ FLUME-855 ]
        E. Sammer made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        E. Sammer made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s NG [ 12318440 ]
        Resolution Fixed [ 1 ]
        Hide
        E. Sammer added a comment -

        Committed to the flume-728 branch. Thanks Prasad!

        Show
        E. Sammer added a comment - Committed to the flume-728 branch. Thanks Prasad!
        E. Sammer made changes -
        Affects Version/s NG [ 12318440 ]
        Hide
        E. Sammer added a comment -

        Committed to flume-728 branch with modifications.

        • Renamed classes to be camel case.
        • Moved HDFS* related classes into o.a.flume.sink.hdfs.
        • Added Hadoop as a dependency to the pom so we properly build / test.
        Show
        E. Sammer added a comment - Committed to flume-728 branch with modifications. Renamed classes to be camel case. Moved HDFS* related classes into o.a.flume.sink.hdfs. Added Hadoop as a dependency to the pom so we properly build / test.
        Prasad Mujumdar made changes -
        Attachment FLUME-761.patch.5 [ 12497581 ]
        Hide
        Prasad Mujumdar added a comment -

        Compression and parametric serializer interface

        Show
        Prasad Mujumdar added a comment - Compression and parametric serializer interface
        Prasad Mujumdar made changes -
        Status In Progress [ 3 ] Patch Available [ 10002 ]
        Prasad Mujumdar made changes -
        Attachment FLUME-761.patch.4 [ 12496823 ]
        Prasad Mujumdar made changes -
        Attachment FLUME-761.patch.4 [ 12496822 ]
        Hide
        Prasad Mujumdar added a comment -

        HDFS sink with roll, batch and bucketing support.
        compression part is not fully tested yet ..

        Show
        Prasad Mujumdar added a comment - HDFS sink with roll, batch and bucketing support. compression part is not fully tested yet ..
        Prasad Mujumdar made changes -
        Attachment FLUME-761.patch.2 [ 12494132 ]
        Hide
        Prasad Mujumdar added a comment -

        First cut with tests
        Limited functionality, no bucketing support yet.

        Show
        Prasad Mujumdar added a comment - First cut with tests Limited functionality, no bucketing support yet.
        Prasad Mujumdar made changes -
        Attachment FLUME-761.patch.1 [ 12493843 ]
        Hide
        Prasad Mujumdar added a comment -

        First cut.
        Limited functionality. not fully tested. Additional changes and tests will follow

        Show
        Prasad Mujumdar added a comment - First cut. Limited functionality. not fully tested. Additional changes and tests will follow
        Prasad Mujumdar made changes -
        Status Open [ 1 ] In Progress [ 3 ]
        E. Sammer made changes -
        Assignee E. Sammer [ esammer ] Prasad Mujumdar [ prasadm ]
        Hide
        E. Sammer added a comment -

        Assigning to Prasad.

        Show
        E. Sammer added a comment - Assigning to Prasad.
        Hide
        E. Sammer added a comment -

        Port the Flume HDFS sink functionality over to Flume NG.

        The interesting features are file rotation, output bucketing, and support for append (flush).

        A minimal implementation would support file rotation. Rotation should be configurable based on both time interval (specified in seconds) and size. Ideally, we do not create files unless there are events output (i.e. lazy file creation). It should be possible to specify rotation for time and size together, meaning rotate on whichever happens first.

        Output bucketing is a feature support by Flume today that allows interpolation of event attributes in output paths. For instance, an output path of /logs/%

        {year}

        /%

        {month}

        /%

        {day}

        / should become /logs/2011/01/01/ for an event with the atributes year=2011, month=01, day=01. This implies we must keep multiple writers open concurrently, each with separate bookkeeping on rotation time and output size.

        Support for append should be orthogonal to file rotation. In other words we should still allow the user to specify a rotation policy (time and size) but we should call flush with a given frequency, probably specified in terms of the number of events. A fully durable configuration would flush after each event (i.e. flushInterval=1). We should only enable append support if the underlying HDFS install supports it. If the user specifies a flush policy and HDFS doesn't support append, we should warn, but continue.

        Show
        E. Sammer added a comment - Port the Flume HDFS sink functionality over to Flume NG. The interesting features are file rotation, output bucketing, and support for append (flush). A minimal implementation would support file rotation. Rotation should be configurable based on both time interval (specified in seconds) and size. Ideally, we do not create files unless there are events output (i.e. lazy file creation). It should be possible to specify rotation for time and size together, meaning rotate on whichever happens first. Output bucketing is a feature support by Flume today that allows interpolation of event attributes in output paths. For instance, an output path of /logs/% {year} /% {month} /% {day} / should become /logs/2011/01/01/ for an event with the atributes year=2011, month=01, day=01. This implies we must keep multiple writers open concurrently, each with separate bookkeeping on rotation time and output size. Support for append should be orthogonal to file rotation. In other words we should still allow the user to specify a rotation policy (time and size) but we should call flush with a given frequency, probably specified in terms of the number of events. A fully durable configuration would flush after each event (i.e. flushInterval=1). We should only enable append support if the underlying HDFS install supports it. If the user specifies a flush policy and HDFS doesn't support append, we should warn, but continue.
        E. Sammer made changes -
        Field Original Value New Value
        Component/s Build [ 12315318 ]
        Component/s Docs [ 12315316 ]
        Component/s Master [ 12315314 ]
        Component/s Node [ 12315315 ]
        Component/s Shell [ 12315317 ]
        Component/s Sinks+Sources [ 12315320 ]
        Component/s Technical Debt [ 12315321 ]
        Component/s Test [ 12315319 ]
        Component/s Web [ 12315322 ]
        E. Sammer created issue -

          People

          • Assignee:
            Prasad Mujumdar
            Reporter:
            E. Sammer
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development