Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.0
    • Fix Version/s: 1.5.1, 1.6.0
    • Component/s: master, tserver
    • Labels:
      None
    • Environment:

      hadoop 2.2.0 and zookeeper 3.4.5

      Description

      Anthony F reports the following:

      I have observed a loss of data when tservers fail during bulk ingest. The keys that are missing are right around the table's splits indicating that data was lost when a tserver died during a split. I am using Accumulo 1.5.0. At around the same time, I observe the master logging a message about "Found two locations for the same extent".

      And:

      I'm currently digging through the logs and will report back. Keep in mind, I'm using Accumulo 1.5.0 on a Hadoop 2.2.0 stack. To determine data loss, I have a 'ConsistencyCheckingIterator' that verifies each row has the expected data (it takes a long time to scan the whole table). Below is a quick summary of what happened. The tablet in question is "d;72~gcm~201304". Notice that it is assigned to 192.168.2.233:9997[343bc1fa155242c] at 2014-01-25 09:49:36,233. At 2014-01-25 09:49:54,141, the tserver goes away. Then, the tablet gets assigned to 192.168.2.223:9997[143bc1f14412432] and shortly after that, I see the BadLocationStateException. The master never recovers from the BLSE - I have to manually delete one of the offending locations.

      2014-01-25 09:49:36,233 [master.Master] DEBUG: Normal Tablets assigning tablet d;72~gcm~201304;72=192.168.2.233:9997[343bc1fa155242c]
      2014-01-25 09:49:36,233 [master.Master] DEBUG: Normal Tablets assigning tablet p;18~thm~2012101;18=192.168.2.233:9997[343bc1fa155242c]
      2014-01-25 09:49:54,141 [master.Master] WARN : Lost servers [192.168.2.233:9997[343bc1fa155242c]]
      2014-01-25 09:49:56,866 [master.Master] DEBUG: 42 assigned to dead servers: [d;03~u36~201302;03~thm~2012091@(null,192.168.2.233:9997[343bc1fa155242c],null), d;06~u36~2013;06~thm~2012083@(null,192.168.2.233:9997[343bc1fa155242c],null), d;25;24~u36~2013@(null,192.168.2.233:9997[343bc1fa155242c],null), d;25~u36~201303;25~thm~201209@(null,192.168.2.233:9997[343bc1fa155242c],null), d;27~gcm~2013041;27@(null,192.168.2.233:9997[343bc1fa155242c],null), d;30~u36~2013031;30~thm~2012082@(null,192.168.2.233:9997[343bc1fa155242c],null), d;34~thm;34~gcm~2013022@(null,192.168.2.233:9997[343bc1fa155242c],null), d;39~thm~20121;39~gcm~20130418@(null,192.168.2.233:9997[343bc1fa155242c],null), d;41~thm;41~gcm~2013041@(null,192.168.2.233:9997[343bc1fa155242c],null), d;42~u36~201304;42~thm~20121@(null,192.168.2.233:9997[343bc1fa155242c],null), d;45~thm~201208;45~gcm~201303@(null,192.168.2.233:9997[343bc1fa155242c],null), d;48~gcm~2013052;48@(null,192.168.2.233:9997[343bc1fa155242c],null), d;60~u36~2013021;60~thm~20121@(null,192.168.2.233:9997[343bc1fa155242c],null), d;68~gcm~2013041;68@(null,192.168.2.233:9997[343bc1fa155242c],null), d;72;71~u36~2013@(null,192.168.2.233:9997[343bc1fa155242c],null), d;72~gcm~201304;72@(192.168.2.233:9997[343bc1fa155242c],null,null), d;75~thm~2012101;75~gcm~2013032@(null,192.168.2.233:9997[343bc1fa155242c],null), d;78;77~u36~201305@(null,192.168.2.233:9997[343bc1fa155242c],null), d;90~u36~2013032;90~thm~2012092@(null,192.168.2.233:9997[343bc1fa155242c],null), d;91~thm;91~gcm~201304@(null,192.168.2.233:9997[343bc1fa155242c],null), d;93~u36~2013012;93~thm~20121@(null,192.168.2.233:9997[343bc1fa155242c],null), m;20;19@(null,192.168.2.233:9997[343bc1fa155242c],null), m;38;37@(null,192.168.2.233:9997[343bc1fa155242c],null), m;51;50@(null,192.168.2.233:9997[343bc1fa155242c],null), m;60;59@(null,192.168.2.233:9997[343bc1fa155242c],null), m;92;91@(null,192.168.2.233:9997[343bc1fa155242c],null), o;01<@(null,192.168.2.233:9997[343bc1fa155242c],null), o;04;03@(null,192.168.2.233:9997[343bc1fa155242c],null), o;50;49@(null,192.168.2.233:9997[343bc1fa155242c],null), o;63;62@(null,192.168.2.233:9997[343bc1fa155242c],null), o;74;73@(null,192.168.2.233:9997[343bc1fa155242c],null), o;97;96@(null,192.168.2.233:9997[343bc1fa155242c],null), p;08~thm~20121;08@(null,192.168.2.233:9997[343bc1fa155242c],null), p;09~thm~20121;09@(null,192.168.2.233:9997[343bc1fa155242c],null), p;10;09~thm~20121@(null,192.168.2.233:9997[343bc1fa155242c],null), p;18~thm~2012101;18@(192.168.2.233:9997[343bc1fa155242c],null,null), p;21;20~thm~201209@(null,192.168.2.233:9997[343bc1fa155242c],null), p;22~thm~2012091;22@(null,192.168.2.233:9997[343bc1fa155242c],null), p;23;22~thm~2012091@(null,192.168.2.233:9997[343bc1fa155242c],null), p;41~thm~2012111;41@(null,192.168.2.233:9997[343bc1fa155242c],null), p;42;41~thm~2012111@(null,192.168.2.233:9997[343bc1fa155242c],null), p;58~thm~201208;58@(null,192.168.2.233:9997[343bc1fa155242c],null)]...
      2014-01-25 09:49:59,706 [master.Master] DEBUG: Normal Tablets assigning tablet d;72~gcm~201304;72=192.168.2.223:9997[143bc1f14412432]
      2014-01-25 09:50:13,515 [master.EventCoordinator] INFO : tablet d;72~gcm~201304;72 was loaded on 192.168.2.223:9997
      2014-01-25 09:51:20,058 [state.MetaDataTableScanner] ERROR: java.lang.RuntimeException: org.apache.accumulo.server.master.state.TabletLocationState$BadLocationStateException: found two locations for the same extent d;72~gcm~201304: 192.168.2.223:9997[143bc1f14412432] and 192.168.2.233:9997[343bc1fa155242c]
      java.lang.RuntimeException: org.apache.accumulo.server.master.state.TabletLocationState$BadLocationStateException: found two locations for the same extent d;72~gcm~201304: 192.168.2.223:9997[143bc1f14412432] and 192.168.2.233:9997[343bc1fa155242c]
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ecn Eric Newton
                Reporter:
                ecn Eric Newton
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: