Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2787

[Python] Memory Issue passing table from python to c++ via cython

    XMLWordPrintableJSON

Details

    Description

      I wanted to create a simple example of reading a table in Python and pass it to C+, but I'm doing something wrong or there is a memory issue. When the table gets to C+ and I print out column names it also prints out a lot of junk and what looks like pydocs. Let me know if you need any more info. Thanks!

      demo.py

      import numpy
      from psy.automl import cyth
      import pandas as pd
      from absl import app
      
      def main(argv):
        sup = pd.DataFrame({
        'int': [1, 2],
        'str': ['a', 'b']
        })
        table = pa.Table.from_pandas(sup)
        cyth.c_t(table)
      

      cyth.pyx

      import pandas as pd
      import pyarrow as pa
      from pyarrow.lib cimport *
      
      cdef extern from "cyth.h" namespace "psy":
       void t(shared_ptr[CTable])
      
      def c_t(obj):
       # These print work
       # for i in range(obj.num_columns):
       # print(obj.column(i).name
        cdef shared_ptr[CTable] tbl = pyarrow_unwrap_table(obj)
        t(tbl)
      

      cyth.h

      #include <iostream>
      #include <string>
      #include "arrow/api.h"
      #include "arrow/python/api.h"
      #include "Python.h"
      
      namespace psy {
      
      void t(std::shared_ptr<arrow::Table> pytable) {
      
      // This works
        std::cout << "NUM" << pytable->num_columns();
      
      // This prints a lot of garbage
        for(int i = 0; i < pytable->num_columns(); i++) {
        std::cout << pytable->column(i)->name();
        }
       }
      }
      

       

      Attachments

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              weazelb0y Joseph Toth
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m