Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1067

Write to parquet with InMemoryOutputStream

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Not A Problem
    • 0.4.0
    • None
    • Python
    • Debian 8.5, Anaconda Python 3.6, pyarrow 0.4.0a0

    Description

      When I run the following (from the docs) python crashes during the pq.write_table statement. How would I go about writing a parquet file to a file buffer (e.g. for use with Azure Data Lake)?

      import pyarrow as pa
      import pyarrow.parquet as pq
      table = pa.Table.from_pandas(df, timestamps_to_ms=True)
      with adl.open(my_file_path, 'wb') as f:
          output = pa.InMemoryOutputStream()
          pq.write_table(table, output) # crashes here
          f.write(output.get_result().to_pybytes())
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            chaseos Chase Slater
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: