Details

      Description

      Having the builder classes available from Python would be very helpful. Currently a construction of an Arrow array always need to have a Python list or numpy array as intermediate. As the builder in combination with jemalloc are very efficient in building up non-chunked memory, it would be nice to directly use them in certain cases.

      The most useful builders are the StringBuilder and DictionaryBuilder as they provide functionality to create columns that are not easily constructed using NumPy methods in Python.

      The basic approach would be to wrap the C++ classes in https://github.com/apache/arrow/blob/master/python/pyarrow/includes/libarrow.pxd so that they can be used from Cython. Afterwards, we should start a new file python/pyarrow/builder.pxi where we have classes take typical Python objects like str and pass them on to the C++ classes. At the end, these classes should also return (Python accessible) pyarrow.Array instances.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                dsimmie Donal Simmie
                Reporter:
                xhochy Uwe L. Korn
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 50m
                  3h 50m