Uploaded image for project: 'Apache Helix'
  1. Apache Helix
  2. HELIX-659

Extend Helix to Support Resource with Multiple States

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.6.x
    • Fix Version/s: 0.6.9
    • Component/s: helix-core
    • Labels:
      None

      Description

      Problem Statement

      Single State Model v.s. Multiple State Models

      Currently, Each Helix resource is associated with a single state model, and each replica of a partition can only be in any one of these states defined in the state model at any time. And Helix manages state transition based on the single state model.

      However, in many scenarios, resources could be more complicated to be modeled by a single state model.
      As an example, partitions from a resource could be described in different dimensions: SlaveMaster state, Read or Write state and its versions. They represent different dimensions of the overall resource status. States from each dimension are based on different state models. Note that we have state machines simplified in this document.

      The basic idea is that states in these 3 dimensions are in parallel and can be changed independently. For instance, R/W state may be changed without updating slave/master state.

      Finite State Machine v.s. Dynamic State Model

      In addition, Helix employs finite state machine to define a state model. However, some state model can not be easily modeled by a finite state machine with fixed states, for example, the versions. We call such state model as the dynamic state model. It is read, set, and understood by the application. We will need to extend Helix to support such dynamic state model. Note that Helix should not and will not be able to calculate the best possible dynamic states.

      The version of a software is one of the best examples to understand dynamic state.

      Let's consider one application that is deployed on multiple nodes, which work together as a cluster. The green node works as the master, and all dark blue nodes are slaves. When Admins upgrades the service from 1.0.0 to 1.1.0, they need to ensure upgrading all nodes to the new version and then claim upgrade is done. After the upgrade process, it is important to ensure that all software versions are consistent.

      If Helix framework is leveraged to support upgrading the cluster, it will help to simplify application logic and ensure consistency. For instance, the service (cluster) itself is regarded as the resource. And each node is mapped as a partition. Then upgrading is simply a state transition. Admins can check external view for ensuring consistency.
      Note that during this version upgrade, the master node is still master node, and slave nodes are still slave nodes. So the version state is parallel to the other states.

        Activity

        Hide
        jiajunwang Jiajun Wang added a comment - - edited

        Based on all that is discussed above, let us imagine a resource represented by 3 independent state models: MasterSlave, ReadWrite, and Versions. The following figure shows three possible state transitions for a replica of the resource.

        Partition 1 has some internal error. So although it is still the master, it is transited to "Error" state. Meantime, it's version needs to be upgraded.
        Partition 2 is changed to "R/W". Probably because partition 1 is no longer servicing as an "R/W" node.
        As for partition 3, all its states are changed.

        The difficulties of supporting this request using current Helix system include but not limited to the following aspects.

        It is hard to define state machine or transition constraint for all state models using the single state model

        For a dynamic state, pre-defined state model won't work at all.

        But even we only consider regular state, there is still a problem. Based on our existing framework, in order to support such scenario, we will need to create a very complex state model that combines all 3 models. The result will be 2 * 3 * 4 = 24 states and around 80 possible transition paths, which will be super hard to code.

        It will be potentially low efficient to do states transition

        Imagine that each state transition message contains the delta of a single state. The messages should be as following.

        Partitions State transitions
        R1 (Online, R/W, 1.0.1) → (Online, Error, 1.0.1)
        (Online, Error, 1.0.1) → (Online, Error, 1.0.2)
        R2
        (Online, Init, 1.0.1) → (Online, R/W, 1.0.1)
        R3
        (Offline, Init, 1.0.1) → (Online, Init, 1.0.1)
        (Online, Init, 1.0.1) → (Online, Ready, 1.0.1)
        (Online, Ready, 1.0.1) → (Online, Ready, 1.0.2)

        Obviously, this strategy increases traffic and make the whole transition process much slower.
        So a simpler design is that a message carries all necessary information.

        Partitions State transitions
        R1 (Online, R/W, 1.0.1) → (Online, Error, 1.0.2)
        R2 (Online, Init, 1.0.1) → (Online, R/W, 1.0.1)
        R3 (Offline, Init, 1.0.1) → (Online, Ready, 1.0.2)

        But this design brings other issues.

        1. When a participant gets a message, it may report the new states after finish all the changes. Among all these states, if one state transition takes a considerably longer time than others, the whole process is blocked.
        2. The controller has less control on how a participant does states transitions. It is a problem if any policy like Helix State Transition Priority Support needs to be applied.
        3. On the other hand, the participant needs to check the message and compare status. It's hard to ensure backward compatibility.

        Helix is not able to calculate the best possible state for every state model

        With dynamic state, we allow the application to manage state transition. So the state model is not defined with a complete constraint and requirement. Helix cannot calculate the best possible states.

        Moreover, even for a nondynamic state, the application may want to trigger the transition based on some external factors. In this case, Helix only coordinates the state transition. But it won't make the best possible states plan.

        In order to let the user define such states, we need to provide a new state model type. And Helix should be able to interpret the definition and generate transition messages correctly.

        Additional Case Study

        Ambry R/W State

        In Ambry, a partition has an "R/W" state in addition to OnlineOffline state. So the partition can be "ONLINE:READ" or "ONLINE:WRITE".
        The "R/W" state is for indicating whether this partition is for read-only or writable.
        There may be state transitions as shown following.

        • The first state transition is conducted by the Ambry application.
        • The second one is regular state transition managed by Helix.

        Note that the "R/W" state model is still regular model. Which means the state is pre-defined and the constraint will still be defined as a regular state.

        Pinot Version State

        In Pinot, when a new version of data is ready, the system replaces old partitions with the new ones.
        If the replacement is done one partition by another, any read that is queried during the upgrade period will get inconsistent data.
        Currently, the application needs a workaround for data consistency.

        • Option 1, creating a new resource with l the test version and replace old resource after the new one is ready.
        • Option 2, maintaining customized configuration or property store item for managing versions inside the application.

        So the expected state transitions of a Pinot section is as follows.

        It would be very helpful to extend Helix state transition system to support multiple state models.

        Proposal

        In this document, we propose to extend existing state transition system in Helix. Basically, Helix should allow one resource/partition to have more than one state. And the states are managed separately based on different state models.

        States transitions shall follow these rules:

        • If only one state is changed, state transition logic keeps the same as what we have today.
        • States have the different priorities. If more than one states are changed, Helix will finish transition one by one based on state model priority. Transition messages are sent one after another.
        • States may have the dependency. If state B depends on state A, transition on state B will require state A's information. And if state A is in error state, state B transition will be suspended. Otherwise, independent states transitions will not be blocked by each other.
        • If the state is managed by the application, Helix won't calculate ideal state. The application needs to specify the desired state in resource configurations.

        State Dependency and Priority

        A complete multi-states definition will be a hierarchical system. The states are divided into different levels. First tier states are the most important ones. And there might be additional second level or third level states related to the higher level states. The states in the same level will be independent to each other.

        For example, Admins may set master/slave (MS) state as the first level state. And both R/W state and Version shall depend on MS state.
        That means transitions in R/W state or Version will require MS state as the input. And if MS state is in error condition, no transition in the other states is allowed.
        But R/W state and Version can be changed in parallel.

        In addition to dependencies, Admins will be able to specify priorities for all related state models. Basically, if multiple states are changed concurrently, Helix will process high priority state transition first. As shown in the following figure, both R/W state and version are the level 2 states. But if Admins configure version to have higher priority, Helix will schedule it before R/W state.

        Application Managed State and Dynamic State

        The nature of the dynamic state makes it an application managed state by default. However, not all application managed state is dynamic states.

        If we check the state model definition from different aspects, the differences between regular state model and new state models are obvious.
        Details about dynamic state design, and how to extend current state model interface will be discussed as a separate topic. In this document, we only consider the simplest design for supporting the basic features. More information is discussed in the "Design Details" section.

        States Transition Constraint Next State
        Regular state define Fixed State Machine Helix decides new state
        Dynamic state define Dynamic Check based on regex or no check Application decides new state
        Application managed state define Both Both Application decides new state

        Multiple State Models vs. Single State Model

        Shall we use multiple state models for every state, or defining a large state model which is able to handle all states transition?

        • In the first option, state models are completely treated equally. So state dependencies have to be resolved by Helix. But it's easier for the application developers to define these state models.
        • In the second option, states relationship can be defined and resolved in the state model class. So the management logic will be simplified. But defining constraints and state transition rules will be difficult for the application developers.

        In this design document, we will take the first option for limiting the change and ensuring backward compatibility. But we may consider the other option in the future.

        The whole feature implementation is divided into 2 phases.

        1. Support secondary states (Described in "First Mile Stone").
        2. Fully support multi-states with hierarchy structure and all feature support.

        The First Milestone

        As the first milestone, we plan to add secondary states support as an optional feature.

        The reason we don't implement the whole feature is one step is:

        1. Limit change for faster iteration.
        2. Ensure backward compatible until major version upgrade. For legacy participants, they won't be able to handle complicated multi-states transition request.

        Secondary States

        • The secondary states are configured separately but in the same way as the main state.
        • The secondary states shall have different state models to avoid conflict. Also, they should have different state models from the main state model.
        • The secondary states will be level 2 states, while the main state is regarded as the level 1 state. Admins will be able to configure the secondary states as dynamic states. All secondary states have the same priority.
        • Helix doesn't calculate ideal state for the secondary states. Only updating in the resource configuration will trigger secondary state transition. The state model can be a regular one with constraints or dynamic state model.

        The following figure demonstrates the workflow of secondary state registration and transition.
        Note that except transition triggering, other major steps are the same as our existing state transition mechanism.

        Show
        jiajunwang Jiajun Wang added a comment - - edited Based on all that is discussed above, let us imagine a resource represented by 3 independent state models: MasterSlave, ReadWrite, and Versions. The following figure shows three possible state transitions for a replica of the resource. Partition 1 has some internal error. So although it is still the master, it is transited to "Error" state. Meantime, it's version needs to be upgraded. Partition 2 is changed to "R/W". Probably because partition 1 is no longer servicing as an "R/W" node. As for partition 3, all its states are changed. The difficulties of supporting this request using current Helix system include but not limited to the following aspects. It is hard to define state machine or transition constraint for all state models using the single state model For a dynamic state, pre-defined state model won't work at all. But even we only consider regular state, there is still a problem. Based on our existing framework, in order to support such scenario, we will need to create a very complex state model that combines all 3 models. The result will be 2 * 3 * 4 = 24 states and around 80 possible transition paths, which will be super hard to code. It will be potentially low efficient to do states transition Imagine that each state transition message contains the delta of a single state. The messages should be as following. Partitions State transitions R1 (Online, R/W, 1.0.1) → (Online, Error, 1.0.1) (Online, Error, 1.0.1) → (Online, Error, 1.0.2) R2 (Online, Init, 1.0.1) → (Online, R/W, 1.0.1) R3 (Offline, Init, 1.0.1) → (Online, Init, 1.0.1) (Online, Init, 1.0.1) → (Online, Ready, 1.0.1) (Online, Ready, 1.0.1) → (Online, Ready, 1.0.2) Obviously, this strategy increases traffic and make the whole transition process much slower. So a simpler design is that a message carries all necessary information. Partitions State transitions R1 (Online, R/W, 1.0.1) → (Online, Error, 1.0.2) R2 (Online, Init, 1.0.1) → (Online, R/W, 1.0.1) R3 (Offline, Init, 1.0.1) → (Online, Ready, 1.0.2) But this design brings other issues. When a participant gets a message, it may report the new states after finish all the changes. Among all these states, if one state transition takes a considerably longer time than others, the whole process is blocked. The controller has less control on how a participant does states transitions. It is a problem if any policy like Helix State Transition Priority Support needs to be applied. On the other hand, the participant needs to check the message and compare status. It's hard to ensure backward compatibility. Helix is not able to calculate the best possible state for every state model With dynamic state, we allow the application to manage state transition. So the state model is not defined with a complete constraint and requirement. Helix cannot calculate the best possible states. Moreover, even for a nondynamic state, the application may want to trigger the transition based on some external factors. In this case, Helix only coordinates the state transition. But it won't make the best possible states plan. In order to let the user define such states, we need to provide a new state model type. And Helix should be able to interpret the definition and generate transition messages correctly. Additional Case Study Ambry R/W State In Ambry, a partition has an "R/W" state in addition to OnlineOffline state. So the partition can be "ONLINE:READ" or "ONLINE:WRITE". The "R/W" state is for indicating whether this partition is for read-only or writable. There may be state transitions as shown following. The first state transition is conducted by the Ambry application. The second one is regular state transition managed by Helix. Note that the "R/W" state model is still regular model. Which means the state is pre-defined and the constraint will still be defined as a regular state. Pinot Version State In Pinot, when a new version of data is ready, the system replaces old partitions with the new ones. If the replacement is done one partition by another, any read that is queried during the upgrade period will get inconsistent data. Currently, the application needs a workaround for data consistency. Option 1, creating a new resource with l the test version and replace old resource after the new one is ready. Option 2, maintaining customized configuration or property store item for managing versions inside the application. So the expected state transitions of a Pinot section is as follows. It would be very helpful to extend Helix state transition system to support multiple state models. Proposal In this document, we propose to extend existing state transition system in Helix. Basically, Helix should allow one resource/partition to have more than one state. And the states are managed separately based on different state models. States transitions shall follow these rules: If only one state is changed, state transition logic keeps the same as what we have today. States have the different priorities. If more than one states are changed, Helix will finish transition one by one based on state model priority. Transition messages are sent one after another. States may have the dependency. If state B depends on state A, transition on state B will require state A's information. And if state A is in error state, state B transition will be suspended. Otherwise, independent states transitions will not be blocked by each other. If the state is managed by the application, Helix won't calculate ideal state. The application needs to specify the desired state in resource configurations. State Dependency and Priority A complete multi-states definition will be a hierarchical system. The states are divided into different levels. First tier states are the most important ones. And there might be additional second level or third level states related to the higher level states. The states in the same level will be independent to each other. For example, Admins may set master/slave (MS) state as the first level state. And both R/W state and Version shall depend on MS state. That means transitions in R/W state or Version will require MS state as the input. And if MS state is in error condition, no transition in the other states is allowed. But R/W state and Version can be changed in parallel. In addition to dependencies, Admins will be able to specify priorities for all related state models. Basically, if multiple states are changed concurrently, Helix will process high priority state transition first. As shown in the following figure, both R/W state and version are the level 2 states. But if Admins configure version to have higher priority, Helix will schedule it before R/W state. Application Managed State and Dynamic State The nature of the dynamic state makes it an application managed state by default. However, not all application managed state is dynamic states. If we check the state model definition from different aspects, the differences between regular state model and new state models are obvious. Details about dynamic state design, and how to extend current state model interface will be discussed as a separate topic. In this document, we only consider the simplest design for supporting the basic features. More information is discussed in the "Design Details" section. States Transition Constraint Next State Regular state define Fixed State Machine Helix decides new state Dynamic state define Dynamic Check based on regex or no check Application decides new state Application managed state define Both Both Application decides new state Multiple State Models vs. Single State Model Shall we use multiple state models for every state, or defining a large state model which is able to handle all states transition? In the first option, state models are completely treated equally. So state dependencies have to be resolved by Helix. But it's easier for the application developers to define these state models. In the second option, states relationship can be defined and resolved in the state model class. So the management logic will be simplified. But defining constraints and state transition rules will be difficult for the application developers. In this design document, we will take the first option for limiting the change and ensuring backward compatibility. But we may consider the other option in the future. The whole feature implementation is divided into 2 phases. Support secondary states (Described in "First Mile Stone"). Fully support multi-states with hierarchy structure and all feature support. The First Milestone As the first milestone, we plan to add secondary states support as an optional feature. The reason we don't implement the whole feature is one step is: Limit change for faster iteration. Ensure backward compatible until major version upgrade. For legacy participants, they won't be able to handle complicated multi-states transition request. Secondary States The secondary states are configured separately but in the same way as the main state. The secondary states shall have different state models to avoid conflict. Also, they should have different state models from the main state model. The secondary states will be level 2 states, while the main state is regarded as the level 1 state. Admins will be able to configure the secondary states as dynamic states. All secondary states have the same priority. Helix doesn't calculate ideal state for the secondary states. Only updating in the resource configuration will trigger secondary state transition. The state model can be a regular one with constraints or dynamic state model. The following figure demonstrates the workflow of secondary state registration and transition. Note that except transition triggering, other major steps are the same as our existing state transition mechanism.
        Hide
        jiajunwang Jiajun Wang added a comment - - edited

        Design Details

        Register Secondary States Model / Factory

        Note that if a secondary state model is a dynamic state, defaultTransitionHandler has to be implemented.

        State Model Factory

        public abstract class DynamicStateModelFactory extends StateModelFactory<DynamicStateModel> {
        ...
        }

        public abstract class DynamicStateModel extends StateModel {
        static final String DEFAULT_INITIAL_STATE = "UNKNOWN";
        protected String _currentState = DEFAULT_INITIAL_STATE;

        public String getCurrentState()

        { return _currentState; }

        // !!!!!!!!!!! Changed part !!!!!!!!!!!! //
        @transition(from='from', to='to')
        public void defaultTransitionHandler(Message message, NotificationContext context)

        { logger .error("Default transition handler. The idea is to invoke this if no transition method is found. To be implemented"); }

        public boolean updateState(String newState)

        { _currentState = newState; return true; }

        public void rollbackOnError(Message message, NotificationContext context,
        StateTransitionError error)

        { logger.error("Default rollback method invoked on error. Error Code: " + error.getCode()); }

        public void reset()

        { logger .warn("Default reset method invoked. Either because the process longer own this resource or session timedout"); }

        // !!!!!!!!!! Internal State such as ERROR will still exist and supported !!!!!!!!!! //
        @Transition(to = "DROPPED", from = "ERROR")
        public void onBecomeDroppedFromError(Message message, NotificationContext context)
        throws Exception

        { logger.info("Default ERROR->DROPPED transition invoked."); }

        }

        Resource Configuration

        Secondary states are conceptually map values.
        Besides the state itself, each state model may have different factory name as well. So there will be <StateModel, Factory> and <StateModel, State>.

        We keep the design that, 1. state configurations are at the partition level. 2. state factory configurations are at the resource level.

        In order to allow multiple states to be configured, we propose to represent it in JSON string format. Note that the state model name is used as the key, so no duplicate model can be used in one partition.

        Resource config with secondary state VERSION

        {
        "id":"Test_Resource"
        ,"simpleFields":{
        "SECONDARY_STATE_MODEL_DEF" : "

        {VERSION : VersionStateModelFactory}

        "
        }
        ,"mapFields":{
        "partition_1" : "

        {VERSION : 1.0.1}

        "
        ,"partition_2" : "

        {VERSION : 1.0.2}

        "
        }
        }

        Additional APIs to configure secondary states

        /**

        • Set configuration values
        • @param scope
        • @param properties
          */
          void setConfig(HelixConfigScope scope, Map<String, List<String>> listProperties);

        /**

        • Get configuration values
        • @param scope
        • @param keys
        • @return configuration values ordered by the provided keys
          */
          Map<String, List<String>> getConfig(HelixConfigScope scope, List<String> keys);

        Partitions with the Secondary States shown in Current State and External View

        Current state shows both the secondary state models and states in the same format with resource configuration.

        Current States

        {
        "id":"example_resource"
        ,"simpleFields":{
        "STATE_MODEL_DEF":"MasterSlave"
        ,"STATE_MODEL_FACTORY_NAME":"DEFAULT"
        ,"BUCKET_SIZE":"0"
        ,"SESSION_ID":"25b2ce5dfbde0fa"
        ,"SECONDARY_STATE_MODEL_DEF" : "

        {VERSION : VersionStateModelFactory}

        "
        }
        ,"listFields":{
        }
        ,"mapFields":{
        "partition_1":{
        "CURRENT_STATE":"MASTER"
        ,"SECONDARY_STATES":"

        {VERSION : 1.0.1}

        "
        ,"INFO":""
        }
        ,"partition_2":{
        "CURRENT_STATE":"SLAVE"
        ,"SECONDARY_STATES":"

        {VERSION : 1.0.1}

        "
        ,"INFO":""
        }
        }
        }

        As for the external view, we have 2 options to show secondary states.
        1. Compressing all states by combining the main state with secondary states. The states are separated by ":".

        Secondary state in External View

        {
        "id":"example_resource"
        ,"simpleFields":{
        "STATE_MODEL_DEF_REF":"MasterSlave"
        ,"ASSOCIATE_STATE_MODEL_DEF_REFS" : "

        {VERSION : VersionStateModelFactory}

        "
        }
        ,"listFields":{
        }
        ,"mapFields":{
        "example_resource_0":{
        "app0004.stg.com_11900":"

        {MasterSlave : MASTER}

        :

        {VERSION : 1.0.1}

        "
        ,"app0048.stg.com_11900":"

        {MasterSlave : SLAVE}

        :

        {VERSION : 1.0.0}

        "
        }
        }
        }

        2. Adding new fields for showing secondary states separately.

        Secondary state in External View

        {
        "id":"example_resource"
        ,"simpleFields":{
        "STATE_MODEL_DEF_REF":"MasterSlave"
        ,"ASSOCIATE_STATE_MODEL_DEF_REFS" : "

        {VERSION : VersionStateModelFactory}

        "
        }
        ,"listFields":{
        }
        ,"mapFields":{
        "example_resource_0":{
        "app0004.stg.com_11900":"MASTER"
        ,"app0048.stg.com_11900":"SLAVE"
        ,"app0048.stg.com_11900_SECONDARY_STATE":"

        {VERSION : 1.0.0}

        "
        ,"app0048.stg.com_11900_SECONDARY_STATE":"

        {VERSION : 1.0.0}

        "
        }
        }
        }

        Actually, both options have backward compatible issues. The first design will change state string, so the legacy client won't be able to interpret. The second design will increase map fields items. So the applications that read this map for all partitions will find additional partitions. And the names are incorrect.
        Comparing these 2 options, the first one fit our long turn goals much better. So it is our choice for phase one.
        As for the backward compatible issue, we plan to create an additional external view ZK node for holding new format. And the old external view node will be kept the same.

        State Transition Message

        On multiple states change, the messages are sent in order according to priority. There won't be parallel state transition on one partition.

        Helix Controller Updates

        When resource configuration is changed:

        • Fill ClusterDataCache with secondary states and state models/factories.
        • Compare for status delta and compose messages accordingly. Order messages according to state model priority.
        • Send the highest priority message to the participant.

        One optimization opportunity is allowing parallel state transition messages if there is no conflict.

        When participant current state is changed:

        • Read secondary states and fill new external view ZK node with encoded complete status information.

        Helix Participant Updates

        On receiving state transition message:

        • Check if the message is a registered state model. Trigger state transition.
        • If any state transition failed, set an error state and stop processing. The user should fix the problem and reset to initial state.
        • If state transition succeeds, update the current state.

        Alternative Options for Supporting Additional States

        Introducing special state for additional status change

        Adding a new internal state UPGRADING (or other special states) for status change.
        So any additional status change will happen when a partition is transited "to" or "from" UPGRADING state.
        Note that application has the freedom to define whether UPGRADING is a special online status or not.This is for decoupling the main state from additional "states".
        For Pinot case, upgrading partition (even before they are back to ONLINE) might be active partition.

        The problem of this new state is that it only works fine for a single additional state model.
        Once we have more than one state models to take care, and they are changed separately, UPGRADING state is not enough.

        Rely on resetting partition to load new "states"

        Whenever new states are going to be set, application updates resource configuration. Then resetting all partitions.
        Then during state transition from offline to online, participants will read new states from the configuration and apply to the related partitions.

        The problem is that changing in additional states will affect the main state. The partition will be offline for a while.

        Application registers additional message handler for customized transition message

        In this method, application owns the logic. Helix just dispatches customized state transition message to trigger the operation. In the message handler, the application read and write the information of the additional state to the property store.

        Consider additional states is a generic requirement, letting multiple applications to implement similar logic separately does not make sense.

        Show
        jiajunwang Jiajun Wang added a comment - - edited Design Details Register Secondary States Model / Factory Note that if a secondary state model is a dynamic state, defaultTransitionHandler has to be implemented. State Model Factory public abstract class DynamicStateModelFactory extends StateModelFactory<DynamicStateModel> { ... } public abstract class DynamicStateModel extends StateModel { static final String DEFAULT_INITIAL_STATE = "UNKNOWN"; protected String _currentState = DEFAULT_INITIAL_STATE; public String getCurrentState() { return _currentState; } // !!!!!!!!!!! Changed part !!!!!!!!!!!! // @transition(from='from', to='to') public void defaultTransitionHandler(Message message, NotificationContext context) { logger .error("Default transition handler. The idea is to invoke this if no transition method is found. To be implemented"); } public boolean updateState(String newState) { _currentState = newState; return true; } public void rollbackOnError(Message message, NotificationContext context, StateTransitionError error) { logger.error("Default rollback method invoked on error. Error Code: " + error.getCode()); } public void reset() { logger .warn("Default reset method invoked. Either because the process longer own this resource or session timedout"); } // !!!!!!!!!! Internal State such as ERROR will still exist and supported !!!!!!!!!! // @Transition(to = "DROPPED", from = "ERROR") public void onBecomeDroppedFromError(Message message, NotificationContext context) throws Exception { logger.info("Default ERROR->DROPPED transition invoked."); } } Resource Configuration Secondary states are conceptually map values. Besides the state itself, each state model may have different factory name as well. So there will be <StateModel, Factory> and <StateModel, State>. We keep the design that, 1. state configurations are at the partition level. 2. state factory configurations are at the resource level. In order to allow multiple states to be configured, we propose to represent it in JSON string format. Note that the state model name is used as the key, so no duplicate model can be used in one partition. Resource config with secondary state VERSION { "id":"Test_Resource" ,"simpleFields":{ "SECONDARY_STATE_MODEL_DEF" : " {VERSION : VersionStateModelFactory} " } ,"mapFields":{ "partition_1" : " {VERSION : 1.0.1} " ,"partition_2" : " {VERSION : 1.0.2} " } } Additional APIs to configure secondary states /** Set configuration values @param scope @param properties */ void setConfig(HelixConfigScope scope, Map<String, List<String>> listProperties); /** Get configuration values @param scope @param keys @return configuration values ordered by the provided keys */ Map<String, List<String>> getConfig(HelixConfigScope scope, List<String> keys); Partitions with the Secondary States shown in Current State and External View Current state shows both the secondary state models and states in the same format with resource configuration. Current States { "id":"example_resource" ,"simpleFields":{ "STATE_MODEL_DEF":"MasterSlave" ,"STATE_MODEL_FACTORY_NAME":"DEFAULT" ,"BUCKET_SIZE":"0" ,"SESSION_ID":"25b2ce5dfbde0fa" ,"SECONDARY_STATE_MODEL_DEF" : " {VERSION : VersionStateModelFactory} " } ,"listFields":{ } ,"mapFields":{ "partition_1":{ "CURRENT_STATE":"MASTER" ,"SECONDARY_STATES":" {VERSION : 1.0.1} " ,"INFO":"" } ,"partition_2":{ "CURRENT_STATE":"SLAVE" ,"SECONDARY_STATES":" {VERSION : 1.0.1} " ,"INFO":"" } } } As for the external view, we have 2 options to show secondary states. 1. Compressing all states by combining the main state with secondary states. The states are separated by ":". Secondary state in External View { "id":"example_resource" ,"simpleFields":{ "STATE_MODEL_DEF_REF":"MasterSlave" ,"ASSOCIATE_STATE_MODEL_DEF_REFS" : " {VERSION : VersionStateModelFactory} " } ,"listFields":{ } ,"mapFields":{ "example_resource_0":{ "app0004.stg.com_11900":" {MasterSlave : MASTER} : {VERSION : 1.0.1} " ,"app0048.stg.com_11900":" {MasterSlave : SLAVE} : {VERSION : 1.0.0} " } } } 2. Adding new fields for showing secondary states separately. Secondary state in External View { "id":"example_resource" ,"simpleFields":{ "STATE_MODEL_DEF_REF":"MasterSlave" ,"ASSOCIATE_STATE_MODEL_DEF_REFS" : " {VERSION : VersionStateModelFactory} " } ,"listFields":{ } ,"mapFields":{ "example_resource_0":{ "app0004.stg.com_11900":"MASTER" ,"app0048.stg.com_11900":"SLAVE" ,"app0048.stg.com_11900_SECONDARY_STATE":" {VERSION : 1.0.0} " ,"app0048.stg.com_11900_SECONDARY_STATE":" {VERSION : 1.0.0} " } } } Actually, both options have backward compatible issues. The first design will change state string, so the legacy client won't be able to interpret. The second design will increase map fields items. So the applications that read this map for all partitions will find additional partitions. And the names are incorrect. Comparing these 2 options, the first one fit our long turn goals much better. So it is our choice for phase one. As for the backward compatible issue, we plan to create an additional external view ZK node for holding new format. And the old external view node will be kept the same. State Transition Message On multiple states change, the messages are sent in order according to priority. There won't be parallel state transition on one partition. Helix Controller Updates When resource configuration is changed: Fill ClusterDataCache with secondary states and state models/factories. Compare for status delta and compose messages accordingly. Order messages according to state model priority. Send the highest priority message to the participant. One optimization opportunity is allowing parallel state transition messages if there is no conflict. When participant current state is changed: Read secondary states and fill new external view ZK node with encoded complete status information. Helix Participant Updates On receiving state transition message: Check if the message is a registered state model. Trigger state transition. If any state transition failed, set an error state and stop processing. The user should fix the problem and reset to initial state. If state transition succeeds, update the current state. Alternative Options for Supporting Additional States Introducing special state for additional status change Adding a new internal state UPGRADING (or other special states) for status change. So any additional status change will happen when a partition is transited "to" or "from" UPGRADING state. Note that application has the freedom to define whether UPGRADING is a special online status or not.This is for decoupling the main state from additional "states". For Pinot case, upgrading partition (even before they are back to ONLINE) might be active partition. The problem of this new state is that it only works fine for a single additional state model. Once we have more than one state models to take care, and they are changed separately, UPGRADING state is not enough. Rely on resetting partition to load new "states" Whenever new states are going to be set, application updates resource configuration. Then resetting all partitions. Then during state transition from offline to online, participants will read new states from the configuration and apply to the related partitions. The problem is that changing in additional states will affect the main state. The partition will be offline for a while. Application registers additional message handler for customized transition message In this method, application owns the logic. Helix just dispatches customized state transition message to trigger the operation. In the message handler, the application read and write the information of the additional state to the property store. Consider additional states is a generic requirement, letting multiple applications to implement similar logic separately does not make sense.

          People

          • Assignee:
            jiajunwang Jiajun Wang
            Reporter:
            jiajunwang Jiajun Wang
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development