Uploaded image for project: 'IMPALA'
  2. IMPALA-10788

Statestore Scalability document should mention statestore_subscriber_timeout_secs and statestore_heartbeat_tcp_timeout_seconds



    • Documentation
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.1.0
    • None
    • None
    • ghx-label-2


      The current document about statestore scalability is http://impala.apache.org/docs/build/html/topics/impala_scalability.html#ariaid-title3

      We should add statestore_heartbeat_tcp_timeout_seconds and statestore_subscriber_timeout_secs to the doc. They also impact the heartbeat timeout detection. Incorrect settings may result in false positive liveness dection and queries failed by "Cancelled due to unreachable impalad(s)" error.

      Current document:

      Scalability Considerations for the Impala Statestore

      Before Impala 2.1, the statestore sent only one kind of message to its subscribers. This message contained all updates for any topics that a subscriber had subscribed to. It also served to let subscribers know that the statestore had not failed, and conversely the statestore used the success of sending a heartbeat to a subscriber to decide whether or not the subscriber had failed.

      Combining topic updates and failure detection in a single message led to bottlenecks in clusters with large numbers of tables, partitions, and HDFS data blocks. When the statestore was overloaded with metadata updates to transmit, heartbeat messages were sent less frequently, sometimes causing subscribers to time out their connection with the statestore. Increasing the subscriber timeout and decreasing the frequency of statestore heartbeats worked around the problem, but reduced responsiveness when the statestore failed or restarted.

      As of Impala 2.1, the statestore now sends topic updates and heartbeats in separate messages. This allows the statestore to send and receive a steady stream of lightweight heartbeats, and removes the requirement to send topic updates according to a fixed schedule, reducing statestore network overhead.

      The statestore now has the following relevant configuration flags for the statestored daemon:

      The number of threads inside the statestore dedicated to sending topic updates. You should not typically need to change this value.
      Default: 10

      The frequency, in milliseconds, with which the statestore tries to send topic updates to each subscriber. This is a best-effort value; if the statestore is unable to meet this frequency, it sends topic updates as fast as it can. You should not typically need to change this value.
      Default: 2000

      The number of threads inside the statestore dedicated to sending heartbeats. You should not typically need to change this value.
      Default: 10

      The frequency, in milliseconds, with which the statestore tries to send heartbeats to each subscriber. This value should be good for large catalogs and clusters up to approximately 150 nodes. Beyond that, you might need to increase this value to make the interval longer between heartbeat messages.
      Default: 1000 (one heartbeat message every second)

      If it takes a very long time for a cluster to start up, and impala-shell consistently displays This Impala daemon is not ready to accept user requests, the statestore might be taking too long to send the entire catalog topic to the cluster. In this case, consider adding --load_catalog_in_background=false to your catalog service configuration. This setting stops the statestore from loading the entire catalog into memory at cluster startup. Instead, metadata for each table is loaded when the table is accessed for the first time.

      We need to add these:
      The time after which a heartbeat RPC to a subscriber will timeout. This setting protects against badly hung machines that are not able to respond to the heartbeat RPC in short order. Increase this if there are intermittent heartbeat RPC timeouts shown in statestore's log. You can reference the max value of "statestore.priority-topic-update-durations" metric on statestore to get a reasonable value. Note that priority topic updates are assumed to be small amounts of data that take a small amount of time to process (similar to the heartbeat complexity).
      Default: 3

      Maximum number of consecutive heartbeat messages an impalad can miss before being declared failed by the statestore. You should not typically need to change this value.
      Default: 10

      The amount of time (in seconds) that may elapse before the connection with the statestore is considered lost. This should be comparable to (statestore_heartbeat_frequency_ms + statestore_heartbeat_tcp_timeout_seconds) * statestore_max_missed_heartbeats, so subscribers won't reregister themselves too early and allow statestore to resend heartbeats. You can also reference the max value of "statestore-subscriber.heartbeat-interval-time" metrics on impalads to get a reasonable value.
      Default: 30


        Issue Links



              stigahuang Quanlong Huang
              stigahuang Quanlong Huang
              0 Vote for this issue
              2 Start watching this issue