Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2913

[Python] Exported buffers don't expose type information

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 0.10.0
    • None
    • C++, Python
    • None

    Description

      Using the buffers() method on array gives you a list of buffers backing the array, but those buffers lose typing information:

      >>> a = pa.array(range(10))
      >>> a.type
      DataType(int64)
      >>> buffers = a.buffers()
      >>> [(memoryview(buf).format, memoryview(buf).shape) for buf in buffers]
      [('b', (2,)), ('b', (80,))]
      

      Conversely, Numpy exposes type information in the Python buffer protocol:

      >>> a = pa.array(range(10))
      >>> memoryview(a.to_numpy()).format
      'l'
      >>> memoryview(a.to_numpy()).shape
      (10,)
      

      Exposing type information on buffers could be important for third-party systems, such as Dask/distributed, for type-based data compression when serializing.

      Since our C++ buffers are not typed, it's not obvious how to solve this. Should we return tensors instead?

      Attachments

        Activity

          People

            Unassigned Unassigned
            apitrou Antoine Pitrou
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: