Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-209

Enhance ParquetWriter with exposing in-memory size of writer object

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      While using ParquetWriter and before closing it to write the content out to the disk, there is no way to check/estimate the size of the output file. This is useful in case we want to close files and upload them based on a minimum size threshold. Since ParquetWriter keeps everything in memory and only writes it out to disk at the very end when writer is closed, it is not possible to have an estimate of the output file size before closing the writer.

      Based on Parquet documentation, the data is written into memory object in the final format, meaning that the size of the object in memory is the very close to the final size on disk. it would be great if you can expose the current size of the parquetWriter object in memory. It is true that such a size will be different than the final output size because of adding the schema and other metadata at the end of the file but it still gives a close estimation of the output file size that will be very useful when reading/writing streams.

      Attachments

        Activity

          People

            Unassigned Unassigned
            reza79 Reza Shiftehfar
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: