Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-368

tablet had location but was not loaded

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.3.5-incubating
    • 1.4.0
    • tserver
    • Running random walktest against 1.4.0-SNAP on 10 node cluster

    Description

      While running the random walk test a delete range operation got hung because it could not split a tablet. The tablet in question failed to load because the tablet server thought it was already serving it.

      03 11:19:18,249 [tabletserver.Tablet] TABLET_HIST: 3nq;77cd1e415c4547a4< split 3nq;133660072804a502< 3nq;77cd1e415c4547a4;133660072804a502
      03 11:19:18,249 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< opened 
      03 11:19:26,236 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< import /b-0005t8f/I0005t8g.rf 388308 0
      03 11:19:45,672 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< MinC [memory] -> /t-0005typ/F0005tz4.rf
      03 11:19:45,686 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< closed
      03 11:19:45,840 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< opened
      03 11:19:45,987 [tabletserver.Tablet] TABLET_HIST: 3nq;133660072804a502< closed
      03 11:19:46,142 [tabletserver.TabletServer] INFO : Loading tablet 3nq;133660072804a502<
      03 11:19:46,144 [tabletserver.TabletServer] ERROR: Tablet seems to be already assigned to xxx.xxx.xxx.9:9997[135396fb18d3fb0]
      03 11:19:46,144 [tabletserver.TabletServer] INFO : Reporting tablet 3nq;133660072804a502< assignment failure: unable to verify Tablet Information
      

      Looking at the walogs below it seems that the data mutations for the last successful open and close were written in reverse order.

      1 mutations:
        3nq;133660072804a502
            ~tab:~pr [system]:959756 [] ^@
            srv:dir [system]:959756 [] /t-0005typ
            srv:time [system]:959756 [] M1328267935757
            loc:135396fb18d3fb0 [system]:959756 [] xxx.xxx.xxx.9:9997
            future:135396fb18d3fb0 [system]:959756 [] <deleted>
            srv:lock [system]:959756 [] tservers/xxx.xxx.xxx.9:9997/zlock-0000000000$135396fb18d3fb0
      
      MUTATION 6462 5
      1 mutations:
        3nq;133660072804a502
            file:/b-0005t8f/I0005t8g.rf [system]:959986 [] 388308,0
            loaded:/b-0005t8f/I0005t8g.rf [system]:959986 [] 1681970597222144296
            srv:time [system]:959986 [] M1328267935757
            srv:lock [system]:959986 [] tservers/xxx.xxx.xxx.9:9997/zlock-0000000000$135396fb18d3fb0
      
      MUTATION 6462 5
      1 mutations:
        3nq;133660072804a502
            file:/t-0005typ/F0005tz4.rf [system]:960298 [] 185156,44330
            srv:time [system]:960298 [] M1328267963158
            last:135396fb18d3fb0 [system]:960298 [] xxx.xxx.xxx.9:9997
            log:xxx.xxx.xxx.12:11224/cad1617c-5fb2-4057-abec-8edd46d0cf7a [system]:960298 [] <deleted>
            log:xxx.xxx.xxx.5:11224/50611604-8e6c-48a8-8e16-eb739a991721 [system]:960298 [] <deleted>
            srv:flush [system]:960298 [] 0
            srv:lock [system]:960298 [] tservers/xxx.xxx.xxx.9:9997/zlock-0000000000$135396fb18d3fb0
      
      MANY_MUTATIONS 6462 5
      1 mutations:
        3nq;133660072804a502
            loc:135396fb18d3fb0 [system]:960302 [] <deleted>
      
      MANY_MUTATIONS 6462 5
      1 mutations:
        3nq;133660072804a502
            future:135396fb18d3fb0 [system]:960321 [] xxx.xxx.xxx.9:9997
      
      MANY_MUTATIONS 6462 5
      1 mutations:
        3nq;133660072804a502
            loc:135396fb18d3fb0 [system]:960326 [] <deleted>
      
      MANY_MUTATIONS 6462 5
      1 mutations:
        3nq;133660072804a502
            loc:135396fb18d3fb0 [system]:960332 [] xxx.xxx.xxx.9:9997
            future:135396fb18d3fb0 [system]:960332 [] <deleted>
      

      Looking at the tablet server code, a tablet is put in online tablets and then the location is written to the metadata table. Since the tablet is in online tablets it could be unloaded. I think that is what happened here. In the short period of time between putting the tablet in onlinetablets and writing the location to the metadata table, the tablet was unloaded.

      Attachments

        Activity

          People

            kturner Keith Turner
            kturner Keith Turner
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: