Running hive export on a table with a large number of partitions winds up making the metastore and client run out of memory. The number of places we wind up having a copy of the entire partitions object wind up being as follows:
- (temporarily) Metastore MPartition objects
- List<Partition> that gets persisted before sending to thrift
- thrift copy of all of those partitions
- thrift copy of partitions
- deepcopy of above to create List<Partition> objects
- JSONObject that contains all of those above partition objects
- List<ReadEntity> which each encapsulates the aforesaid partition objects.
This memory usage needs to be drastically reduced.