1. Accumulo
  2. ACCUMULO-444

Data loss possible when tablet killed immediately after recovery


    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.3.5
    • Fix Version/s: 1.3.6, 1.4.0
    • Component/s: tserver
    • Labels:
    • Environment:

      Running random walk, continuous ingest, and agitator on 10 node cluster.


      Came in after a weekend of running test to find the Shard random walk test had lost data in its index table. After debugging I found the following sequence of events occurred.

      • Mutation X was written to shard index on Tablet T1
      • X was minor compacted to file F1
      • Tablet server serving T1 was killed
      • When T1 came up on another tablet server, it did not know about F1

      The above sequence of events indicate that the !METADATA table lost data. So I started looking into that, and found the following sequence of events.

      • Tablet server T1 serving METADATA tablet MT was killed
      • MT comes up on another tablet server T2
      • Mutation Y is written to MT about file F1 for tablet T1
      • Tablet server T2 is killed.
      • MT comes up in tablet server T3
      • The mutations for MT from T1 are recovered, but not from T2.. therefore Y is lost

      There is code that supposed to handle this situation, but its not working... I think this issue exist in 1.3

      Data loss is not certain in this situation. In the scenario above, when MT is loaded on T2 a minor compaction is started. If the server is killed before this minor compaction completes then data loss will likely occur.


        No work has yet been logged on this issue.


          • Assignee:
            Keith Turner
            Keith Turner
          • Votes:
            0 Vote for this issue
            0 Start watching this issue


            • Created: