Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3173 Reduce catalog's memory footprint
  3. IMPALA-4029

Reduce memory requirements for storing THdfsFileDesc

    XMLWordPrintableJSON

Details

    Description

      The memory representation of Hdfs files in the catalog is highly inefficient and can be significantly improved. Currently, the Catalog uses ~400-500 bytes per THdfsFileDescriptor object which essentially includes: a) the file name and b) a list of THdfsFileBlocks. Every file block stores information about replicas, disks ids and whether the replica is cached or not. All that information is currently stored in Thrift objects and can be significantly compressed.

      Also, the catalog and the Impalad services spend a lot of time (and memory) serializing/deserializing Thrift objects. Using a more efficient serialization library (e.g. FlatBufffers) can significantly improve memory efficiency and speed while processing catalog updates.

      Attachments

        Activity

          People

            dtsirogiannis Dimitris Tsirogiannis
            dtsirogiannis Dimitris Tsirogiannis
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: