Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13654

[C++][Parquet] Appending a FileMetaData object to itselfs explodes memory

    XMLWordPrintableJSON

Details

    Description

      Writing a tiny parquet file, to read in its metadata (to obtain a FileMetaData object):

      import pyarrow as pa
      import pyarrow.parquet as pq
      
      table = pa.table({'a': [1, 2, 3], 'b': [4, 5, 6]})
      pq.write_table(table, "test_file_for_metadata.parquet")
      metadata = pq.read_metadata("test_file_for_metadata.parquet")
      
      metadata.append_row_groups(metadata)
      

      The last line using AppendRowGroups (appending the metadata object to itself) keeps running with increasing memory usage (I killed the process when it was using 10 GB).

      This is not something useful to do, but still I wouldn't expect it to blow up (as one can accidentally do it; I was actually trying it in a attempt to create a large FileMetaData object).

      Attachments

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m