Description
If we want to create a shard cluster, we need index numbers to split shards.
Currently we can use app_container_tag for shard index number.
And I also can add replication nodes to increases data availability.
For example, If we want to 5 shards and 2 replication cluster, shard index numbers are like this.
total desired components = 10
shard_index = app_container_tag % 5
replication = 0 if app_container_tag < 5 else 1
However using app_container_tag is a problem during decreasing the number of components.
Because we don’t know that allocated components are for shard or not.
So shard component can be released even if we want to decrease replication component.
For a shard cluster, I added the following functionality.
- write allocated component’s shard and replication informations in zookeeper
- during releasing component, replication component is released first
- add command to change shard and replication configuration
And we need to add the following configurations and script functions.
- Application Configuration
"global": { "create.default.zookeeper.node": "true", "site.global.zookeeper.quorum": "${ZK_HOST}", "site.global.zookeeper.port": "2181", "site.global.zookeeper.znode.parent": "${DEFAULT_ZK_PATH}", "site.global.COMPONENT_NAME.shard": "10", "site.global.COMPONENT_NAME.replication": "2" }
- Resource Specification
"components": { "COMPONENT_NAME": { "yarn.role.priority": "1", "yarn.component.instances": "10", "yarn.vcores": "2", "yarn.memory": "8192", "yarn.component.placement.policy": "32", "yarn.placement.escalate.seconds": "60" }
- Placement Policy
DEFAULT_SHARD_REPLICATION = 32 ANTI_AFFINITY_SHARD_REPLICATION = 64
- add allocation function in start() function of App Package
import container_tag ... def start(self, env): import params env.set_params(params) shard, replication = container_tag.allocate_container_tag(params.zk_server, params.zk_parent, 'COMPONENT_NAME', params.container_id) ...
- make argument data for container_tag.allocate_container_tag() in params.py
config = Script.get_config() global_conf = config['configurations']['global'] zk_quorum = global_conf['zookeeper.quorum'] zk_port = global_conf['zookeeper.port'] zk_server = ','.join([h + ':' + str(nopq_zk_port) for h in nopq_zk_quorum.split(',')]) zk_parent = global_conf['zookeeper.znode.parent'] container_id = global_conf['app_container_id']