Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6265

dedup Metastore data structures or at least protocol

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Metastore
    • Labels:
      None

      Description

      Metastore currently stores SD per partition, and column schema/serde/... per SD.
      Most of the time all the partitions have the same setup in a table, the only different things in SD/CD/... being the location. In such cases, we don't need to store these separately and send them to client when many partitions are retrieved for a large table. While storage changes may be too complex wrt backward compat, as well as with DataNucleus being in the picture and controlling the db schema/persistence, at least we can avoid sending lots of duplicate data to the client on the network; thrift protocol can be modified to omit duplicate data in a backward compatible manner.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              sershe Sergey Shelukhin
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: