Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Having the builder classes available from Python would be very helpful. Currently a construction of an Arrow array always need to have a Python list or numpy array as intermediate. As the builder in combination with jemalloc are very efficient in building up non-chunked memory, it would be nice to directly use them in certain cases.
The most useful builders are the StringBuilder and DictionaryBuilder as they provide functionality to create columns that are not easily constructed using NumPy methods in Python.
The basic approach would be to wrap the C++ classes in https://github.com/apache/arrow/blob/master/python/pyarrow/includes/libarrow.pxd so that they can be used from Cython. Afterwards, we should start a new file python/pyarrow/builder.pxi where we have classes take typical Python objects like str and pass them on to the C++ classes. At the end, these classes should also return (Python accessible) pyarrow.Array instances.
Attachments
Issue Links
- links to