Details
-
Documentation
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
ghx-label-2
Description
The current document about statestore scalability is http://impala.apache.org/docs/build/html/topics/impala_scalability.html#ariaid-title3
We should add statestore_heartbeat_tcp_timeout_seconds and statestore_subscriber_timeout_secs to the doc. They also impact the heartbeat timeout detection. Incorrect settings may result in false positive liveness dection and queries failed by "Cancelled due to unreachable impalad(s)" error.
Current document:
---------------------------------------------------
Scalability Considerations for the Impala Statestore
Before Impala 2.1, the statestore sent only one kind of message to its subscribers. This message contained all updates for any topics that a subscriber had subscribed to. It also served to let subscribers know that the statestore had not failed, and conversely the statestore used the success of sending a heartbeat to a subscriber to decide whether or not the subscriber had failed.
Combining topic updates and failure detection in a single message led to bottlenecks in clusters with large numbers of tables, partitions, and HDFS data blocks. When the statestore was overloaded with metadata updates to transmit, heartbeat messages were sent less frequently, sometimes causing subscribers to time out their connection with the statestore. Increasing the subscriber timeout and decreasing the frequency of statestore heartbeats worked around the problem, but reduced responsiveness when the statestore failed or restarted.
As of Impala 2.1, the statestore now sends topic updates and heartbeats in separate messages. This allows the statestore to send and receive a steady stream of lightweight heartbeats, and removes the requirement to send topic updates according to a fixed schedule, reducing statestore network overhead.
The statestore now has the following relevant configuration flags for the statestored daemon:
-statestore_num_update_threads
The number of threads inside the statestore dedicated to sending topic updates. You should not typically need to change this value.
Default: 10
-statestore_update_frequency_ms
The frequency, in milliseconds, with which the statestore tries to send topic updates to each subscriber. This is a best-effort value; if the statestore is unable to meet this frequency, it sends topic updates as fast as it can. You should not typically need to change this value.
Default: 2000
-statestore_num_heartbeat_threads
The number of threads inside the statestore dedicated to sending heartbeats. You should not typically need to change this value.
Default: 10
-statestore_heartbeat_frequency_ms
The frequency, in milliseconds, with which the statestore tries to send heartbeats to each subscriber. This value should be good for large catalogs and clusters up to approximately 150 nodes. Beyond that, you might need to increase this value to make the interval longer between heartbeat messages.
Default: 1000 (one heartbeat message every second)
If it takes a very long time for a cluster to start up, and impala-shell consistently displays This Impala daemon is not ready to accept user requests, the statestore might be taking too long to send the entire catalog topic to the cluster. In this case, consider adding --load_catalog_in_background=false to your catalog service configuration. This setting stops the statestore from loading the entire catalog into memory at cluster startup. Instead, metadata for each table is loaded when the table is accessed for the first time.
-----------------------------------------------------
We need to add these:
-----------------------------------------------------
-statestore_heartbeat_tcp_timeout_seconds
The time after which a heartbeat RPC to a subscriber will timeout. This setting protects against badly hung machines that are not able to respond to the heartbeat RPC in short order. Increase this if there are intermittent heartbeat RPC timeouts shown in statestore's log. You can reference the max value of "statestore.priority-topic-update-durations" metric on statestore to get a reasonable value. Note that priority topic updates are assumed to be small amounts of data that take a small amount of time to process (similar to the heartbeat complexity).
Default: 3
-statestore_max_missed_heartbeats
Maximum number of consecutive heartbeat messages an impalad can miss before being declared failed by the statestore. You should not typically need to change this value.
Default: 10
-statestore_subscriber_timeout_secs
The amount of time (in seconds) that may elapse before the connection with the statestore is considered lost. This should be comparable to (statestore_heartbeat_frequency_ms + statestore_heartbeat_tcp_timeout_seconds) * statestore_max_missed_heartbeats, so subscribers won't reregister themselves too early and allow statestore to resend heartbeats. You can also reference the max value of "statestore-subscriber.heartbeat-interval-time" metrics on impalads to get a reasonable value.
Default: 30
-----------------------------------------------------
Attachments
Issue Links
- links to
Commit 795dc8e985660c56bfdaf0142c0ff3fba61d0ca1 in impala's branch refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=795dc8e ]
IMPALA-10788: [DOCS] Introduce more flags for Statestore Scalabilitystatestore_subscriber_timeout_secs is also an important flag for
statestore scalability. Especially when the heartbeat frequency and
timeout are bumpped up, this flag should also be bumpped as well.
To introduce this flag, we should also introduce the heartbeat tcp
timeout flag and the max missed heartbeat flag.
Tests:
Change-Id: Ia4b331693c5c0945f4cec8fd81ed9ec688563333
Reviewed-on: http://gerrit.cloudera.org:8080/17675
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Shajini Thayasingh <sthayasingh@cloudera.com>
Reviewed-by: Tamas Mate <tmate@cloudera.com>