Dan Harvey who is still on 0.20.x had a similar issue this month. He added four new servers to his cluster. These new servers were not resolving properly. What we were seeing is that on startup, I believe, these new servers would be assigned their portion of the regions on checkin. Then, the basescanner would run – its 0.20.x hbase – and it would not recognize the address the new servers were writing .META. and it would then think the regions unassigned and would assign them elsewhere. So, we have double-assignment and at same time there was splitting and compactions running. His .META. had holes and overlaps.
In his case, not all tables were honked. Just the big ones. I wonder if an improved add_table.rb would work in this case; i.e. do the same rewrite of the .META. content for a single table based off the content in the filesystem rather than trying fix up on .META. table.
Let me try adding add_table.rb to hbck. Let me add option of running per table and then a global, restore all tables.
Dan sent me the .META. dir content. It looks like this:
-rw-r--r--@ 1 Stack staff 0 Jul 7 08:26 281906331022358506
-rw-r--r--@ 1 Stack staff 94283152 Jul 7 08:26 5233066973300534672
-rw-r--r--@ 1 Stack staff 0 Jul 7 08:26 6803125877105432645
-rw-r--r--@ 1 Stack staff 0 Jul 7 08:26 8650632001596730954
i.e. three zero-length files. I wonder how these were written (I asked him for a dir listing from actual cluster).