Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.5.1
Description
SimpleDataSourceStreamReader is a simplified version of the DataStreamReader interface.
- It doesn’t require developers to reason about data partitioning.
- It doesn’t require getting the latest offset before reading data.
There are 3 functions that needs to be defined
1. Read data and return the end offset.
def read(self, start: Offset) -> (Iterator[Tuple], Offset)
2. Read data between start and end offset, this is required for exactly once read.
def read2(self, start: Offset, end: Offset) -> Iterator[Tuple]
3. initial start offset of the streaming query.
def initialOffset() -> dict
Implementation: Wrap the SimpleDataSourceStreamReader instance in a DataSourceStreamReader internally and make the prefetching and caching transparent to the data source developer. The record prefetched in python process will be sent to JVM as arrow record batches.