I think a brief design for this memory manager is:
Every new writer registers itself to the manager. The manager has an overall view of all the writers. When a condition is up (such as every 1000 rows), it will notify the writers to check memory usage and flush if necessary.
However, a problem for Parquet specifically is: Hive only has a wrapper for the ParquetRecordWriter, and even ParquetRecordWriter also wrap the real writer (InternalParquetRecordWriter) in Parquet project. Since the behaviors of measuring dynamic buffer size and flushing are private in the real writer, I think we also have to add code in InternalParquetRecordWriter to implement the memory manager functionality.
It seems only changing Hive code cannot fix this Jira.
Not sure whether we should put this problem in Parquet project and fix it there, if it is generic enough and not Hive specific?
Any other ideas?