Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.1
    • Fix Version/s: 1.6.0
    • Component/s: Sinks+Sources
    • Labels:
      None

      Description

      There are a number of in-progress or needed enhancements to the Kite DatasetSink:

      I think it makes sense to tackle these as a more general DatasetSink 2.0 effort.

      1. Dataset Sink 2.0 Design Document.pdf
        72 kB
        Joey Echeverria
      2. FLUME-2591-0.patch
        80 kB
        Joey Echeverria
      3. FLUME-2591-1.patch
        85 kB
        Joey Echeverria
      4. FLUME-2591-2.patch
        86 kB
        Joey Echeverria

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          UNSTABLE: Integrated in flume-trunk #710 (See https://builds.apache.org/job/flume-trunk/710/)
          FLUME-2591. DatasetSink 2.0 (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=1d49ef704a8bb08280b4e653e6db94dc3d2c2475)

          • flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/DatasetSinkConstants.java
          • flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/DatasetSink.java
          • flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/SavePolicy.java
          • flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/EntityParser.java
          • flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/AvroParser.java
          • flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/RetryPolicy.java
          • flume-ng-doc/sphinx/FlumeUserGuide.rst
          • flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/FailurePolicyFactory.java
          • flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/EntityParserFactory.java
          • flume-ng-sinks/flume-dataset-sink/src/test/java/org/apache/flume/sink/kite/TestDatasetSink.java
          • pom.xml
          • flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/FailurePolicy.java
          • flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/NonRecoverableEventException.java
          Show
          hudson Hudson added a comment - UNSTABLE: Integrated in flume-trunk #710 (See https://builds.apache.org/job/flume-trunk/710/ ) FLUME-2591 . DatasetSink 2.0 (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=1d49ef704a8bb08280b4e653e6db94dc3d2c2475 ) flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/DatasetSinkConstants.java flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/DatasetSink.java flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/SavePolicy.java flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/EntityParser.java flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/AvroParser.java flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/RetryPolicy.java flume-ng-doc/sphinx/FlumeUserGuide.rst flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/FailurePolicyFactory.java flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/EntityParserFactory.java flume-ng-sinks/flume-dataset-sink/src/test/java/org/apache/flume/sink/kite/TestDatasetSink.java pom.xml flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/FailurePolicy.java flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/NonRecoverableEventException.java
          Hide
          hudson Hudson added a comment -

          UNSTABLE: Integrated in Flume-trunk-hbase-98 #67 (See https://builds.apache.org/job/Flume-trunk-hbase-98/67/)
          FLUME-2591. DatasetSink 2.0 (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=1d49ef704a8bb08280b4e653e6db94dc3d2c2475)

          • flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/EntityParserFactory.java
          • flume-ng-sinks/flume-dataset-sink/src/test/java/org/apache/flume/sink/kite/TestDatasetSink.java
          • flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/NonRecoverableEventException.java
          • flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/DatasetSink.java
          • flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/EntityParser.java
          • flume-ng-doc/sphinx/FlumeUserGuide.rst
          • flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/AvroParser.java
          • flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/RetryPolicy.java
          • flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/DatasetSinkConstants.java
          • flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/FailurePolicy.java
          • pom.xml
          • flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/SavePolicy.java
          • flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/FailurePolicyFactory.java
          Show
          hudson Hudson added a comment - UNSTABLE: Integrated in Flume-trunk-hbase-98 #67 (See https://builds.apache.org/job/Flume-trunk-hbase-98/67/ ) FLUME-2591 . DatasetSink 2.0 (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=1d49ef704a8bb08280b4e653e6db94dc3d2c2475 ) flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/EntityParserFactory.java flume-ng-sinks/flume-dataset-sink/src/test/java/org/apache/flume/sink/kite/TestDatasetSink.java flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/NonRecoverableEventException.java flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/DatasetSink.java flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/EntityParser.java flume-ng-doc/sphinx/FlumeUserGuide.rst flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/parser/AvroParser.java flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/RetryPolicy.java flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/DatasetSinkConstants.java flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/FailurePolicy.java pom.xml flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/SavePolicy.java flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite/policy/FailurePolicyFactory.java
          Hide
          hshreedharan Hari Shreedharan added a comment -

          Committed! Thanks Joey!

          Show
          hshreedharan Hari Shreedharan added a comment - Committed! Thanks Joey!
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 914106c0f12650a7b16ba565e2aaddaad3d95540 in flume's branch refs/heads/flume-1.6 from Hari Shreedharan
          [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=914106c ]

          FLUME-2591. DatasetSink 2.0

          (Joey Echeverria via Hari)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 914106c0f12650a7b16ba565e2aaddaad3d95540 in flume's branch refs/heads/flume-1.6 from Hari Shreedharan [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=914106c ] FLUME-2591 . DatasetSink 2.0 (Joey Echeverria via Hari)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1d49ef704a8bb08280b4e653e6db94dc3d2c2475 in flume's branch refs/heads/trunk from Hari Shreedharan
          [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=1d49ef7 ]

          FLUME-2591. DatasetSink 2.0

          (Joey Echeverria via Hari)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1d49ef704a8bb08280b4e653e6db94dc3d2c2475 in flume's branch refs/heads/trunk from Hari Shreedharan [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=1d49ef7 ] FLUME-2591 . DatasetSink 2.0 (Joey Echeverria via Hari)
          Hide
          hshreedharan Hari Shreedharan added a comment -

          +1. Running tests now. I'd like to add more detailed documentation of Parquet + memory requirements, though that can go into another jira.

          Show
          hshreedharan Hari Shreedharan added a comment - +1. Running tests now. I'd like to add more detailed documentation of Parquet + memory requirements, though that can go into another jira.
          Hide
          rdblue Ryan Blue added a comment -

          +1 (non-binding)

          Show
          rdblue Ryan Blue added a comment - +1 (non-binding)
          Hide
          fwiffo Joey Echeverria added a comment -

          This patch should be good to go.

          Show
          fwiffo Joey Echeverria added a comment - This patch should be good to go.
          Hide
          fwiffo Joey Echeverria added a comment -

          I have one more parameter name change, so hold of on committing this for now.

          Show
          fwiffo Joey Echeverria added a comment - I have one more parameter name change, so hold of on committing this for now.
          Hide
          fwiffo Joey Echeverria added a comment -

          I updated the patch following the feedback from Ryan and Tom.

          Show
          fwiffo Joey Echeverria added a comment - I updated the patch following the feedback from Ryan and Tom.
          Hide
          fwiffo Joey Echeverria added a comment -

          I formatted the latest version as a patch file. This should match what's in the PR.

          Show
          fwiffo Joey Echeverria added a comment - I formatted the latest version as a patch file. This should match what's in the PR.
          Hide
          fwiffo Joey Echeverria added a comment -

          I turned the compare into a PR against my trunk (updated a few minutes ago to match upstream trunk):

          https://github.com/joey/flume/pull/1

          This should make it easier to make comments.

          Show
          fwiffo Joey Echeverria added a comment - I turned the compare into a PR against my trunk (updated a few minutes ago to match upstream trunk): https://github.com/joey/flume/pull/1 This should make it easier to make comments.
          Hide
          fwiffo Joey Echeverria added a comment -

          Yes, but I don't want this to have to wait for the 0.18.0 release.

          I filed FLUME-2596 to track 0.18 related updates.

          Show
          fwiffo Joey Echeverria added a comment - Yes, but I don't want this to have to wait for the 0.18.0 release. I filed FLUME-2596 to track 0.18 related updates.
          Hide
          rdblue Ryan Blue added a comment -

          Should this also handle the DatasetRecordException that will be in 0.18.0?

          Show
          rdblue Ryan Blue added a comment - Should this also handle the DatasetRecordException that will be in 0.18.0?
          Hide
          fwiffo Joey Echeverria added a comment -

          I updated the patch with more tests and added the docs. The only thing left is CSV parsing, but that should probably be done in a follow-on ticket as it requires an unreleased version of Kite.

          Show
          fwiffo Joey Echeverria added a comment - I updated the patch with more tests and added the docs. The only thing left is CSV parsing, but that should probably be done in a follow-on ticket as it requires an unreleased version of Kite.
          Hide
          fwiffo Joey Echeverria added a comment -

          I've written a patch that's ready for initial review:

          https://github.com/joey/flume/compare/FLUME-2591-datasetsink-2.0

          It still needs more tests written, but everything is implemented. I based the EntityParser code on the patch that Ryan Blue wrote.

          Because so much as changed, it might be easier to just review it as all new code:

          https://github.com/joey/flume/tree/FLUME-2591-datasetsink-2.0/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite

          All of the existing tests passed without modification and any old configs are also respected.

          TODO:

          1. Write more tests
          2. Update docs with new configuration parameters
          3. Add support for CSV parsing (this requires a new Kite release)

          Show
          fwiffo Joey Echeverria added a comment - I've written a patch that's ready for initial review: https://github.com/joey/flume/compare/FLUME-2591-datasetsink-2.0 It still needs more tests written, but everything is implemented. I based the EntityParser code on the patch that Ryan Blue wrote. Because so much as changed, it might be easier to just review it as all new code: https://github.com/joey/flume/tree/FLUME-2591-datasetsink-2.0/flume-ng-sinks/flume-dataset-sink/src/main/java/org/apache/flume/sink/kite All of the existing tests passed without modification and any old configs are also respected. TODO: 1. Write more tests 2. Update docs with new configuration parameters 3. Add support for CSV parsing (this requires a new Kite release)
          Hide
          fwiffo Joey Echeverria added a comment -

          Here's a design document going more in-depth into the rationale and current thinking around the design.

          Show
          fwiffo Joey Echeverria added a comment - Here's a design document going more in-depth into the rationale and current thinking around the design.

            People

            • Assignee:
              fwiffo Joey Echeverria
              Reporter:
              fwiffo Joey Echeverria
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development