Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28260

Possible data loss in WAL after RegionServer crash

    XMLWordPrintableJSON

Details

    • Hide
      Adds a new flag hbase.regionserver.wal.avoid-local-writes. When true (default false), we will avoid writing a block replica to the local datanode for WAL writes. This will improve MTTR and redundancy, but may come with a performance impact for WAL writes. It's recommended to enable, but monitor performance in doing so if that is a concern for you.
      Show
      Adds a new flag hbase.regionserver.wal.avoid-local-writes. When true (default false), we will avoid writing a block replica to the local datanode for WAL writes. This will improve MTTR and redundancy, but may come with a performance impact for WAL writes. It's recommended to enable, but monitor performance in doing so if that is a concern for you.

    Description

      We recently had a production incident:

      1. RegionServer crashes, but local DataNode lives on
      2. WAL lease recovery kicks in
      3. Namenode reconstructs the block during lease recovery (which results in a new genstamp). It chooses the replica on the local DataNode as the primary.
      4. Local DataNode reconstructs the block, so NameNode registers the new genstamp.
      5. Local DataNode and the underlying host dies, before the new block could be replicated to other replicas.

      This leaves us with a missing block, because the new genstamp block has no replicas. The old replicas still remain, but are considered corrupt due to GENSTAMP_MISMATCH.

      Thankfully we were able to confirm that the length of the corrupt blocks were identical to the newly constructed and lost block. Further, the file in question was only 1 block. So we downloaded one of those corrupt block files and hdfs hdfs dfs -put -f to force that block to replace the file in hdfs. So in this case we had no actual data loss, but it could have happened easily if the file was more than 1 block or the replicas weren't fully in sync prior to reconstruction.

      In order to avoid this issue, we should avoid writing WAL blocks too the local datanode. We can use CreateFlag.NO_WRITE_LOCAL for this. Hat tip to weichiu for pointing this out.

      During reading of WALs we already reorder blocks so as to avoid reading from the local datanode, but avoiding writing there altogether would be better.

      Attachments

        Activity

          People

            charlesconnell Charles Connell
            bbeaudreault Bryan Beaudreault
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: