Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1964

[Python] Expose Builder classes

    XMLWordPrintableJSON

Details

    Description

      Having the builder classes available from Python would be very helpful. Currently a construction of an Arrow array always need to have a Python list or numpy array as intermediate. As the builder in combination with jemalloc are very efficient in building up non-chunked memory, it would be nice to directly use them in certain cases.

      The most useful builders are the StringBuilder and DictionaryBuilder as they provide functionality to create columns that are not easily constructed using NumPy methods in Python.

      The basic approach would be to wrap the C++ classes in https://github.com/apache/arrow/blob/master/python/pyarrow/includes/libarrow.pxd so that they can be used from Cython. Afterwards, we should start a new file python/pyarrow/builder.pxi where we have classes take typical Python objects like str and pass them on to the C++ classes. At the end, these classes should also return (Python accessible) pyarrow.Array instances.

      Attachments

        Issue Links

          Activity

            People

              dsimmie Donal Simmie
              uwe Uwe Korn
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 50m
                  3h 50m