[SPARK-18682] Batch Source for Kafka - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.1.1, 2.2.0
Component/s: SQL, Structured Streaming
Labels:
None

Target Version/s:

2.2.0

Description

Today, you can start a stream that reads from kafka. However, given kafka's configurable retention period, it seems like sometimes you might just want to read all of the data that is available now. As such we should add a version that works with spark.read as well.

The options should be the same as the streaming kafka source, with the following differences:

startingOffsets should default to earliest, and should not allow latest (which would always be empty).
endingOffsets should also be allowed and should default to latest. the same assign json format as startingOffsets should also be accepted.

It would be really good, if things like .limit(n) were enough to prevent all the data from being read (this might just work).

Attachments

Issue Links

duplicates

SPARK-18386 Batch mode SQL source for Kafka

Closed

links to

[Github] Pull Request #16686 (tcondie)

Activity

People

Assignee:: Tyson Condie

Reporter:: Michael Armbrust

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 02/Dec/16 01:35

Updated:: 07/Feb/17 22:45

Resolved:: 07/Feb/17 22:45