Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
4.0.0
Description
Allow users to read from a Python data source using `spark.read.format(...).load()` in PySpark. For example
Users can extend the DataSource and the DataSourceReader classes to create their own Python data source reader and use them in PySpark:
class MyReader(DataSourceReader): def read(self, partition): yield (0, 1) class MyDataSource(DataSource): def schema(self): return "id INT, value INT" def reader(self, schema): return MyReader() df = spark.read.format("MyDataSource").load() df.show() +---+-----+ | id|value| +---+-----+ | 0| 1| +---+-----+
Attachments
Issue Links
- links to