Description
Ran continuous ingest with agitation for 14 hours. After this the tablets were left in an unbalanced state. Saw the following in the master logs.
See a new tablet server xxx.xxx.xxx.12:9997[235396fb181e0c6]
12 07:47:19,370 [master.Master] INFO : New servers: [xxx.xxx.xxx.12:9997[235396fb181e0c6]] 12 07:50:27,199 [master.Master] INFO : New servers: [xxx.xxx.xxx.4:9997[135396fb18ee67f], xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.13:9997[135396fb18ee715], xxx.xxx.xxx.9:9997[3353986642be24e], xxx.xxx.xxx.6:9997[135396fb18ee6ca], xxx.xxx.xxx.10:9997[235396fb181df91], xxx.xxx.xxx.8:9997[135396fb18ee6cb], xxx.xxx.xxx.12:9997[235396fb181e0c6]]
The tablet server dies
12 07:57:44,868 [master.Master] DEBUG: Normal Tablets assigning tablet 6;06e04e;06c056=xxx.xxx.xxx.12:9997[235396fb181e0c6] 12 08:05:30,984 [master.Master] INFO : New servers: [xxx.xxx.xxx.4:9997[235396fb181e109], xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.9:9997[3353986642be300], xxx.xxx.xxx.6:9997[235396fb181e107], xxx.xxx.xxx.10:9997[235396fb181df91], xxx.xxx.xxx.8:9997[135396fb18ee7ab], xxx.xxx.xxx.12:9997[235396fb181e0c6], xxx.xxx.xxx.13:9997[235396fb181e108]] 12 08:05:56,044 [master.Master] WARN : Lost servers [xxx.xxx.xxx.12:9997[235396fb181e0c6]] 12 08:05:58,718 [master.Master] ERROR: unable to get tablet server status xxx.xxx.xxx.12:9997[235396fb181e0c6] null 12 08:05:58,718 [master.Master] DEBUG: unable to get tablet server status xxx.xxx.xxx.12:9997[235396fb181e0c6] 12 08:05:58,721 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.8:9997[135396fb18ee7ab], xxx.xxx.xxx.12:9997[235396fb181e0c6]] 12 08:05:58,728 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.8:9997[135396fb18ee7ab], xxx.xxx.xxx.12:9997[235396fb181e0c6]] 12 08:05:59,065 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.8:9997[135396fb18ee7ab], xxx.xxx.xxx.12:9997[235396fb181e0c6]] 12 08:05:59,641 [master.Master] DEBUG: 1 assigned to dead servers: [6;3d40b2;3d20b1@(null,xxx.xxx.xxx.12:9997[235396fb181e0c6],null)]... 12 08:05:59,715 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.12:9997[235396fb181e0c6]]
Another instance of a tablet server start on xxx.xxx.xxx.12
12 08:07:35,245 [master.Master] INFO : New servers: [xxx.xxx.xxx.7:9997[3353986642be345], xxx.xxx.xxx.12:9997[235396fb181e15c]]
Much later its still not balancing for some reason.
13 16:31:24,131 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.12:9997[235396fb181e0c6]]