Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-13345

LLAP: metadata cache takes too much space, esp. with bloom filters, due to Java/protobuf overhead

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      We cache java objects currently; these have high overhead, average stripe metadata takes 200-500Kb on real files, and with bloom filters blowing up more than x5 due to being stored as list of Long-s, up to 5Mb per stripe. That is undesirable.

      We should either create better objects for ORC (might be good in general) or store serialized metadata and deserialize when needed.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sershe Sergey Shelukhin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: