Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10709

[Python] Difficult to make an efficient zero-copy file reader in Python

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 3.0.0
    • Python

    Description

      There is an option to do efficient data transport using file.read_buffer() using zero memory copies (benchmarking have confirmed that, very nice!).

      However, file.read_buffer() when backed by a Python object (via PythonFile), will call PythonFile.read() via PyReadableFile::Read. A 'normal' file.read() that does memory copying, also calls the PythonFile.read() method, but only allows for a bytes object (PyBytes_Check is used in PyReadableFile::Read).
      This makes it hard to create 1 file object in Python land that supports normal .read() (and thus needs to returns a bytes object) and to also support a zero-copy route where .read() can return a memory view.
      Possibly the strict check on PyBytes_Check can me lifted by also allowing trying PyObject_GetBuffer.

      Attachments

        Issue Links

          Activity

            People

              maartenbreddels Maarten Breddels
              maartenbreddels Maarten Breddels
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h 10m
                  4h 10m