Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6486

[Python] Allow subclassing & monkey-patching of Table

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Python

      Description

      Currently, many classes in pyarrow behave strangely to the Python user: they are neither subclassable not monkey-patchable.

       

      >>> import pyarrow as pa
      >>> class MyTable(pa.Table):
      ... pass
      ...
      >>> table = MyTable.from_arrays([], [])
      >>> type(table)
      <class 'pyarrow.lib.Table'>

      The factory method did not return an instance of our subclass...

      Never mind, let's monkey-patch Table:

      >>> pa.TableOriginal = pa.Table
      >>> pa.Table = MyTable
      >>> table = pa.Table.from_arrays([], [])
      >>> type(table)
      <class 'pyarrow.lib.Table'>

       

      OK, that did not work either.

      Let's be sneaky:

      >>> table._class_ = MyTable
      Traceback (most recent call last):
      {{ File "<stdin>", line 1, in <module>}}
      TypeError: _class_ assignment only supported for heap types or ModuleType subclasses
      >>>

       

      There is currently no way to modify or extend the behaviour of a Table instance. Users can use only what pyarrow provides out of the box. - This is likely to be a source of frustration for many python users.

       

      The attached PR remedies this for the Table class:

      >>> import pyarrow as pa
      >>> class MyTable(pa.Table):
      ... pass
      ...
      >>> table = MyTable.from_arrays([], [])
      >>> type(table)
      <class '_main_.MyTable'>
      >>>
      >>> pa.TableOriginal = pa.Table
      >>> pa.Table = MyTable
      >>> table = pa.Table.from_arrays([], [])
      >>> type(table)
      <class '_main_.MyTable'>
      >>>

       

      Ideally, these modifications would be extended to the other cython-defined classes of pyarrow, but given that Table is likely to be the interface that most users begin their interaction with, I thought this would be a good start.

      Keeping the changes limited to a single class should also keep merge conflicts manageable.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                ARF1 ARF
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m