Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-315

Hole in metadata table occurred during random walk test

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.4.0
    • master, tserver
    • Running 1.4.0 SNAPSHOT on 10 node cluster.

    Description

      While running the random walk test a hole in the metadata table occurred. A client tried to delete the table with the whole and the fate op got stuck. Was continually seeing the following in the master logs.

      14 00:02:11,273 [tableOps.CleanUp] DEBUG: Still waiting for table to be deleted: 4ct locationState: 4ct;4d2d3be2823b0bf4;27b693c626c2d4ef@(null,xxx.xxx.xxx.xxx:9997[134d7425fc503e1],null)
      

      The metadata table contained the following. Tablet 4ct;4d2d3be2823b0bf4 had a location.

      4ct;262249211a62cd6f ~tab:~pr []    \x011819e56edae21302
      4ct;27b693c626c2d4ef ~tab:~pr []    \x01262249211a62cd6f
      4ct;43422047c78fa52b ~tab:~pr []    \x0141ea825af0f262d9
      4ct;4d2d3be2823b0bf4 ~tab:~pr []    \x0127b693c626c2d4ef
      4ct;4f89df61392bb311 ~tab:~pr []    \x014d2d3be2823b0bf4
      

      Found the following events on a tablet server.

      #the tablet server events below are caused by the delete range operation
      13 21:36:04,287 [tabletserver.Tablet] TABLET_HIST: 4ct;4d2d3be2823b0bf4;262249211a62cd6f split 4ct;27b693c626c2d4ef;262249211a62cd6f 4ct;4d2d3be2823b0bf4;27b693c626c2d4ef
      
      13 21:36:04,369 [tabletserver.Tablet] TABLET_HIST: 4ct;4d2d3be2823b0bf4;27b693c626c2d4ef split 4ct;41ea825af0f262d9;27b693c626c2d4ef 4ct;4d2d3be2823b0bf4;41ea825af0f262d9
      
      13 21:36:04,370 [tabletserver.Tablet] TABLET_HIST: 4ct;4d2d3be2823b0bf4;41ea825af0f262d9 opened
      
      13 21:36:06,141 [tabletserver.Tablet] TABLET_HIST: 4ct;4d2d3be2823b0bf4;41ea825af0f262d9 closed
      13 21:36:06,142 [tabletserver.Tablet] DEBUG: Files for low split 4ct;43422047c78fa52b;41ea825af0f262d9  [/t-0001cdi/F0001bmw.rf, /t-0001cdi/F0001bn1.rf]
      13 21:36:06,142 [tabletserver.Tablet] DEBUG: Files for high split 4ct;4d2d3be2823b0bf4;43422047c78fa52b  [/t-0001cdi/A0001cef.rf, /t-0001cdi/F0001bmw.rf, /t-0001cdi/F0001bn1.rf]
      
      #split from other random walker
      13 21:36:06,351 [tabletserver.Tablet] TABLET_HIST: 4ct;4d2d3be2823b0bf4;41ea825af0f262d9 split 4ct;43422047c78fa52b;41ea825af0f262d9 4ct;4d2d3be2823b0bf4;43422047c78fa52b
      

      The following events occurred on the master and overlap in time with the split on the tablet server.

      13 21:36:06,312 [master.EventCoordinator] INFO : Merge state of 4ct;41ea825af0f262d9;27b693c626c2d4ef set to MERGING
      13 21:36:06,312 [master.Master] DEBUG: Deleting tablets for 4ct;41ea825af0f262d9;27b693c626c2d4ef
      13 21:36:06,316 [master.Master] DEBUG: Found following tablet 4ct;4d2d3be2823b0bf4;43422047c78fa52b
      13 21:36:06,317 [master.Master] DEBUG: Making file deletion entries for 4ct;41ea825af0f262d9;27b693c626c2d4ef
      13 21:36:06,325 [master.Master] DEBUG: Removing metadata table entries in range [4ct;27b693c626c2d4ef%00; : [] 9223372036854775807 false,4ct;41ea825af0f262d9%00; : [] 9223372036854775807 false)
      13 21:36:06,331 [master.Master] DEBUG: Updating prevRow of 4ct;4d2d3be2823b0bf4;43422047c78fa52b to 27b693c626c2d4ef
      

      After many hours of debugging Eric and I figured out what was going on. Two random walkers were running the concurrent test. One client initiated a delete range on table id 4ct for the range 27b693c626c2d4ef to 41ea825af0f262d9. While this delete range operation was occurring another client add the split point 43422047c78fa52b. The master read the metadata table while the split was occurring and got inconsistent/incomplete information about what tablets related to the delete range operation were online. It assumed the required tablets were offline when they were not. The log messages above show that the split and updating of the prevRow by the master overlap in time.

      We think the best solution is to ensure that scans of the metadata table for merges and delete range are consistent with respect to end row and prev end row matching. Can not consider tablets individually. Must ensure the portion of the metadata table under consideration forms a proper sorted linked list.

      Attachments

        Activity

          People

            kturner Keith Turner
            kturner Keith Turner
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: