Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
The goal is to:
- generate a substrait plan in Python using Ibis
- ... wherein tables are specified using custom URLs
- use the python API run_query to execute the plan
- ... against source data which is streamed from those URLs rather than pulled fully into local memory
The obstacles include:
- The API for constructing a data stream from the custom URLs is only available in c++
- The python run_query function requires tables as input and cannot accept a RecordBatchReader even if one could be constructed from a custom URL
- Writing custom cython is not preferred
Some potential solutions:
- Use ExecuteSerializedPlan() directly usable from c++ so that construction of data sources need not be handled in python. Passing a buffer from python/ibis down to C++ is much simpler and can be navigated without writing cython
- Refactor NamedTableProvider from a lambda mapping names -> data source into a registry so that data source factories can be added from c++ then referenced by name from python
- Extend run_query to support non-Table sources and require the user to write a python mapping from URLs to pa.RecordBatchReader