Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-4246

Consider increasing shared RocksDB LRU cache size on datanodes

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.1.0
    • None
    • Ozone Datanode
    • None

    Description

      By default when a rocksDB instance is opened, a 8MB LRU cache is associated with the instance. From the rocksDB manual, many instances in the same process can share the same LRU cache:

      https://github.com/facebook/rocksdb/wiki/Block-Cache

      A Cache object can be shared by multiple RocksDB instances in the same process, allowing users to control the overall cache capacity.

      This is of particular interest on the datanodes, where there are potentially thousands of small rocksDB instances.

      This RocksDB PR, added a feature to the Java implementation, allowing a LRU cache to be explicitly created and passed to different "Options" objects to ensure the same cache is reused:

      Cache cache = new LRUCache(64 * SizeUnit.MB);
      BlockBasedTableConfig table_options = new BlockBasedTableConfig();
      table_options.setBlockCache(cache);
      Options options = new Options();
      options.setCreateIfMissing(true)
          .setStatistics(stats)
          .setTableFormatConfig(table_options);
      ...
      

      Before this feature, the way to reuse a cache across many DB instances is to pass the exact same RocksDB Options object when creating the RocksDB instance. This means that a possible unintended side effect of HDDS-2283 (which caches the RocksDB options, and re-uses them across all DB containers) is that there is now only 1 8MB RocksDB cache across all the container RocksDBs on the datanode.

      You can see this is the case, by grepping the rocksDB LOG file. Eg, with Option caching, in two containers:

      bash-4.2$ grep -A5 "block_cache:" ./hdds/hdds/2ad8eea5-b9e1-41e1-85eb-8cae745efcb6/current/containerDir0/2/metadata/2-dn-container.db/LOG
        no_block_cache: 0
        block_cache: 0x563ba9088bb0    <=====
        block_cache_name: LRUCache
        block_cache_options:
          capacity : 8388608
          num_shard_bits : 4
          strict_capacity_limit : 0
      bash-4.2$ grep -A5 "block_cache:" ./hdds/hdds/2ad8eea5-b9e1-41e1-85eb-8cae745efcb6/current/containerDir0/3/metadata/3-dn-container.db/LOG
        no_block_cache: 0
        block_cache: 0x563ba9088bb0   <=====
        block_cache_name: LRUCache
        block_cache_options:
          capacity : 8388608
          num_shard_bits : 4
          strict_capacity_limit : 0
      

      Note the block cache in both containers shares the same address "0x563ba9088bb0".

      Reverting the caching change, so that a new Options object is passed into the RocksDB instance, we can see the cache address is different:

      bash-4.2$ grep -A5 "block_cache:" ./hdds/hdds/ac115132-9693-4ab9-9d73-dd4bf7e40caf/current/containerDir0/4/metadata/4-dn-container.db/LOG
        no_block_cache: 0
        block_cache: 0x7feec0b86270   <=====
        block_cache_name: LRUCache
        block_cache_options:
          capacity : 8388608
          num_shard_bits : 4
          strict_capacity_limit : 0
      bash-4.2$ grep -A5 "block_cache:" ./hdds/hdds/ac115132-9693-4ab9-9d73-dd4bf7e40caf/current/containerDir0/1/metadata/1-dn-container.db/LOG
        no_block_cache: 0
        block_cache: 0x565360926f70   <=====
        block_cache_name: LRUCache
        block_cache_options:
          capacity : 8388608
          num_shard_bits : 4
          strict_capacity_limit : 0
      

      From this, it is very likely a single 8MB cache for all containers on a large node is not sufficient. We should consider if it makes sense to set a larger shared cache size on the DN, or have several shared caches.

      Note that I have not seen any performance issues caused by this, but I came across this when investigating RocksDB in general.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sodonnell Stephen O'Donnell
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: