Accumulo
  1. Accumulo
  2. ACCUMULO-393

Master not balancing after agitation

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.0
    • Component/s: None
    • Labels:

      Description

      Ran continuous ingest with agitation for 14 hours. After this the tablets were left in an unbalanced state. Saw the following in the master logs.

      See a new tablet server xxx.xxx.xxx.12:9997[235396fb181e0c6]

      12 07:47:19,370 [master.Master] INFO : New servers: [xxx.xxx.xxx.12:9997[235396fb181e0c6]]
      12 07:50:27,199 [master.Master] INFO : New servers: [xxx.xxx.xxx.4:9997[135396fb18ee67f], xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.13:9997[135396fb18ee715], xxx.xxx.xxx.9:9997[3353986642be24e], xxx.xxx.xxx.6:9997[135396fb18ee6ca], xxx.xxx.xxx.10:9997[235396fb181df91], xxx.xxx.xxx.8:9997[135396fb18ee6cb], xxx.xxx.xxx.12:9997[235396fb181e0c6]]
      

      The tablet server dies

      12 07:57:44,868 [master.Master] DEBUG: Normal Tablets assigning tablet 6;06e04e;06c056=xxx.xxx.xxx.12:9997[235396fb181e0c6]
      12 08:05:30,984 [master.Master] INFO : New servers: [xxx.xxx.xxx.4:9997[235396fb181e109], xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.9:9997[3353986642be300], xxx.xxx.xxx.6:9997[235396fb181e107], xxx.xxx.xxx.10:9997[235396fb181df91], xxx.xxx.xxx.8:9997[135396fb18ee7ab], xxx.xxx.xxx.12:9997[235396fb181e0c6], xxx.xxx.xxx.13:9997[235396fb181e108]]
      12 08:05:56,044 [master.Master] WARN : Lost servers [xxx.xxx.xxx.12:9997[235396fb181e0c6]]
      12 08:05:58,718 [master.Master] ERROR: unable to get tablet server status xxx.xxx.xxx.12:9997[235396fb181e0c6] null
      12 08:05:58,718 [master.Master] DEBUG: unable to get tablet server status xxx.xxx.xxx.12:9997[235396fb181e0c6]
      12 08:05:58,721 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.8:9997[135396fb18ee7ab], xxx.xxx.xxx.12:9997[235396fb181e0c6]]
      12 08:05:58,728 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.8:9997[135396fb18ee7ab], xxx.xxx.xxx.12:9997[235396fb181e0c6]]
      12 08:05:59,065 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.8:9997[135396fb18ee7ab], xxx.xxx.xxx.12:9997[235396fb181e0c6]]
      12 08:05:59,641 [master.Master] DEBUG: 1 assigned to dead servers: [6;3d40b2;3d20b1@(null,xxx.xxx.xxx.12:9997[235396fb181e0c6],null)]...
      12 08:05:59,715 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.12:9997[235396fb181e0c6]]
      

      Another instance of a tablet server start on xxx.xxx.xxx.12

      12 08:07:35,245 [master.Master] INFO : New servers: [xxx.xxx.xxx.7:9997[3353986642be345], xxx.xxx.xxx.12:9997[235396fb181e15c]]
      

      Much later its still not balancing for some reason.

      13 16:31:24,131 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.12:9997[235396fb181e0c6]]
      

        Activity

        Hide
        Eric Newton added a comment -

        needed to shrink the list of badServers to the list of the current servers

        Show
        Eric Newton added a comment - needed to shrink the list of badServers to the list of the current servers

          People

          • Assignee:
            Eric Newton
            Reporter:
            Keith Turner
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development