Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9526

[Python] Memorymapped arrow file conversion to parquet loads everything into RAM

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • C++, Python
    • None

    Description

      When converting a memory mapped arrow file into parquet file, it loads the whole table into RAM. This effectively negates the point of memory mapping.

      If this is not a bug,  perhaps there is a proper way of converting the memorymapped arrow file to parquet without using excessive memory?

       

      Example code:

          source = pa.memory_map(path_to_arrow_file, 'r')
          table = pa.ipc.RecordBatchFileReader(source).read_all()
          # The followlng line will load the whole thing into RAM
          pq.write_table(table, path_to_parquet)

      Attachments

        Activity

          People

            Unassigned Unassigned
            seperman Sep Dehpour
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: