Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6486

[Python] Allow subclassing & monkey-patching of Table

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • Python

    Description

      Currently, many classes in pyarrow behave strangely to the Python user: they are neither subclassable not monkey-patchable.

       

      >>> import pyarrow as pa
      >>> class MyTable(pa.Table):
      ... pass
      ...
      >>> table = MyTable.from_arrays([], [])
      >>> type(table)
      <class 'pyarrow.lib.Table'>

      The factory method did not return an instance of our subclass...

      Never mind, let's monkey-patch Table:

      >>> pa.TableOriginal = pa.Table
      >>> pa.Table = MyTable
      >>> table = pa.Table.from_arrays([], [])
      >>> type(table)
      <class 'pyarrow.lib.Table'>

       

      OK, that did not work either.

      Let's be sneaky:

      >>> table._class_ = MyTable
      Traceback (most recent call last):
      {{ File "<stdin>", line 1, in <module>}}
      TypeError: _class_ assignment only supported for heap types or ModuleType subclasses
      >>>

       

      There is currently no way to modify or extend the behaviour of a Table instance. Users can use only what pyarrow provides out of the box. - This is likely to be a source of frustration for many python users.

       

      The attached PR remedies this for the Table class:

      >>> import pyarrow as pa
      >>> class MyTable(pa.Table):
      ... pass
      ...
      >>> table = MyTable.from_arrays([], [])
      >>> type(table)
      <class '_main_.MyTable'>
      >>>
      >>> pa.TableOriginal = pa.Table
      >>> pa.Table = MyTable
      >>> table = pa.Table.from_arrays([], [])
      >>> type(table)
      <class '_main_.MyTable'>
      >>>

       

      Ideally, these modifications would be extended to the other cython-defined classes of pyarrow, but given that Table is likely to be the interface that most users begin their interaction with, I thought this would be a good start.

      Keeping the changes limited to a single class should also keep merge conflicts manageable.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ARF1 ARF
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m