Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-393

Master not balancing after agitation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.0
    • None

    Description

      Ran continuous ingest with agitation for 14 hours. After this the tablets were left in an unbalanced state. Saw the following in the master logs.

      See a new tablet server xxx.xxx.xxx.12:9997[235396fb181e0c6]

      12 07:47:19,370 [master.Master] INFO : New servers: [xxx.xxx.xxx.12:9997[235396fb181e0c6]]
      12 07:50:27,199 [master.Master] INFO : New servers: [xxx.xxx.xxx.4:9997[135396fb18ee67f], xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.13:9997[135396fb18ee715], xxx.xxx.xxx.9:9997[3353986642be24e], xxx.xxx.xxx.6:9997[135396fb18ee6ca], xxx.xxx.xxx.10:9997[235396fb181df91], xxx.xxx.xxx.8:9997[135396fb18ee6cb], xxx.xxx.xxx.12:9997[235396fb181e0c6]]
      

      The tablet server dies

      12 07:57:44,868 [master.Master] DEBUG: Normal Tablets assigning tablet 6;06e04e;06c056=xxx.xxx.xxx.12:9997[235396fb181e0c6]
      12 08:05:30,984 [master.Master] INFO : New servers: [xxx.xxx.xxx.4:9997[235396fb181e109], xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.9:9997[3353986642be300], xxx.xxx.xxx.6:9997[235396fb181e107], xxx.xxx.xxx.10:9997[235396fb181df91], xxx.xxx.xxx.8:9997[135396fb18ee7ab], xxx.xxx.xxx.12:9997[235396fb181e0c6], xxx.xxx.xxx.13:9997[235396fb181e108]]
      12 08:05:56,044 [master.Master] WARN : Lost servers [xxx.xxx.xxx.12:9997[235396fb181e0c6]]
      12 08:05:58,718 [master.Master] ERROR: unable to get tablet server status xxx.xxx.xxx.12:9997[235396fb181e0c6] null
      12 08:05:58,718 [master.Master] DEBUG: unable to get tablet server status xxx.xxx.xxx.12:9997[235396fb181e0c6]
      12 08:05:58,721 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.8:9997[135396fb18ee7ab], xxx.xxx.xxx.12:9997[235396fb181e0c6]]
      12 08:05:58,728 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.8:9997[135396fb18ee7ab], xxx.xxx.xxx.12:9997[235396fb181e0c6]]
      12 08:05:59,065 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.7:9997[3353986642be160], xxx.xxx.xxx.5:9997[3353986642be24f], xxx.xxx.xxx.8:9997[135396fb18ee7ab], xxx.xxx.xxx.12:9997[235396fb181e0c6]]
      12 08:05:59,641 [master.Master] DEBUG: 1 assigned to dead servers: [6;3d40b2;3d20b1@(null,xxx.xxx.xxx.12:9997[235396fb181e0c6],null)]...
      12 08:05:59,715 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.12:9997[235396fb181e0c6]]
      

      Another instance of a tablet server start on xxx.xxx.xxx.12

      12 08:07:35,245 [master.Master] INFO : New servers: [xxx.xxx.xxx.7:9997[3353986642be345], xxx.xxx.xxx.12:9997[235396fb181e15c]]
      

      Much later its still not balancing for some reason.

      13 16:31:24,131 [master.Master] DEBUG: not balancing because the balance information is out-of-date [xxx.xxx.xxx.12:9997[235396fb181e0c6]]
      

      Attachments

        Activity

          People

            ecn Eric C. Newton
            kturner Keith Turner
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: