[SPARK-1645] Improve Spark Streaming compatibility with Flume - ASF JIRA

Rank to Top

Rank to Bottom

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.1.0
Component/s: DStreams
Labels:
None

Target Version/s:

1.1.0

Description

Currently the following issues affect Spark Streaming and Flume compatibilty:

If a spark worker goes down, it needs to be restarted on the same node, else Flume cannot send data to it. We can fix this by adding a Flume receiver that is polls Flume, and a Flume sink that supports this.

Receiver sends acks to Flume before the driver knows about the data. The new receiver should also handle this case.

Data loss when driver goes down - This is true for any streaming ingest, not just Flume. I will file a separate jira for this and we should work on it there. This is a longer term project and requires considerable development work.

I intend to start working on these soon. Any input is appreciated. (It'd be great if someone can add me as a contributor on jira, so I can assign the jira to myself).

Attachments

Issue Links

Add Link

is related to

SPARK-1647 Prevent data loss when Streaming driver goes down

Closed

Delete this link

Sub-Tasks

Create Sub-Task

1.	Make Flume pull data from source, rather than the current push model		Resolved	Hari Shreedharan	Actions
2.	Make receiver store data reliably to avoid data-loss on executor failures		Closed	Hari Shreedharan	Actions

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Tathagata Das

Reporter:: Hari Shreedharan

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 26/Apr/14 19:22

Updated:: 01/Aug/14 20:47

Resolved:: 01/Aug/14 20:47

Agile

View on Board

Improve Spark Streaming compatibility with Flume

Details

Description

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates

Agile

Slack

Issue deployment