Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Metastore currently stores SD per partition, and column schema/serde/... per SD.
Most of the time all the partitions have the same setup in a table, the only different things in SD/CD/... being the location. In such cases, we don't need to store these separately and send them to client when many partitions are retrieved for a large table. While storage changes may be too complex wrt backward compat, as well as with DataNucleus being in the picture and controlling the db schema/persistence, at least we can avoid sending lots of duplicate data to the client on the network; thrift protocol can be modified to omit duplicate data in a backward compatible manner.