Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Won't Fix
-
None
-
None
Description
Currently, many classes in pyarrow behave strangely to the Python user: they are neither subclassable not monkey-patchable.
>>> import pyarrow as pa
>>> class MyTable(pa.Table):
... pass
...
>>> table = MyTable.from_arrays([], [])
>>> type(table)
<class 'pyarrow.lib.Table'>
The factory method did not return an instance of our subclass...
Never mind, let's monkey-patch Table:
>>> pa.TableOriginal = pa.Table
>>> pa.Table = MyTable
>>> table = pa.Table.from_arrays([], [])
>>> type(table)
<class 'pyarrow.lib.Table'>
OK, that did not work either.
Let's be sneaky:
>>> table._class_ = MyTable
Traceback (most recent call last):
{{ File "<stdin>", line 1, in <module>}}
TypeError: _class_ assignment only supported for heap types or ModuleType subclasses
>>>
There is currently no way to modify or extend the behaviour of a Table instance. Users can use only what pyarrow provides out of the box. - This is likely to be a source of frustration for many python users.
The attached PR remedies this for the Table class:
>>> import pyarrow as pa
>>> class MyTable(pa.Table):
... pass
...
>>> table = MyTable.from_arrays([], [])
>>> type(table)
<class '_main_.MyTable'>
>>>
>>> pa.TableOriginal = pa.Table
>>> pa.Table = MyTable
>>> table = pa.Table.from_arrays([], [])
>>> type(table)
<class '_main_.MyTable'>
>>>
Ideally, these modifications would be extended to the other cython-defined classes of pyarrow, but given that Table is likely to be the interface that most users begin their interaction with, I thought this would be a good start.
Keeping the changes limited to a single class should also keep merge conflicts manageable.
Attachments
Issue Links
- links to