Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
There are several HMS APIs that return a list of partitions, e.g. get_partitions_ps(), get_partitions_by_names(), add_partitions_req() with needResult=true, etc. Each partition instance will have a unique list of FieldSchemas as the partition schema:
org.apache.hadoop.hive.metastore.api.Partition -> org.apache.hadoop.hive.metastore.api.StorageDescriptor -> cols: list<org.apache.hadoop.hive.metastore.api.FieldSchema> |
This could occupy a large memory footprint for wide tables (e.g. with 2k cols). See the heap histogram in IMPALA-11812 as an example.
Some engines like Impala doesn't actually use/respect the partition level schema. It's a waste of network/serde resource to transmit them. It'd be nice if these APIs provide an optional boolean flag for ignoring partition schemas. So HMS clients (e.g. Impala) don't need to clear them later (to save mem).|
Attachments
Issue Links
- relates to
-
IMPALA-11812 Catalogd OOM due to lots of HMS FieldSchema instances
- Resolved
- links to