HBase
  1. HBase
  2. HBASE-4274

RS should periodically ping its HLog pipeline even if no writes are arriving

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.92.0
    • Fix Version/s: None
    • Component/s: regionserver, wal
    • Labels:
      None

      Description

      If you restart HDFS underneath HBase, when HBase isn't taking any write load, the region servers won't "notice" that there's any problem until the next time they take a write, at which point they will abort (because the pipeline is gone from beneath them). It would be better if they wrote some garbage to their HLog once every few seconds as a sort of keepalive, so they will aggressively abort as soon as there's an issue.

        Issue Links

          Activity

          Hide
          stack added a comment -

          Marking down to major and moving out of 0.96. Bring back in if folks want RS to die quickly when HDFS goes out from under HBase (It does seem like general tendency though is to go the other direction, and try and ride over an HDFS outage if possible).

          Show
          stack added a comment - Marking down to major and moving out of 0.96. Bring back in if folks want RS to die quickly when HDFS goes out from under HBase (It does seem like general tendency though is to go the other direction, and try and ride over an HDFS outage if possible).
          Hide
          Lars Hofhansl added a comment -

          No movement, removing from 0.92.

          Show
          Lars Hofhansl added a comment - No movement, removing from 0.92.
          Hide
          stack added a comment -

          Moving out of 0.92. This does not seem to be a critical 0.92 issue any more given Gary work.

          Show
          stack added a comment - Moving out of 0.92. This does not seem to be a critical 0.92 issue any more given Gary work.
          Hide
          Ted Yu added a comment -

          Gary has addressed rolling restart of DNs.
          Can we move this issue to 0.94 ?

          Show
          Ted Yu added a comment - Gary has addressed rolling restart of DNs. Can we move this issue to 0.94 ?
          Hide
          Andrew Purtell added a comment -

          In general we should opt for strategies that allow the RS to ride over short DFS interruptions, such as a rolling restart of DNs, or a switch reload or failover, or similar. So I lean toward -1 changes that make the RS more aggressive about terminating in such situations as long as we also reason carefully about (avoiding) data loss.

          Show
          Andrew Purtell added a comment - In general we should opt for strategies that allow the RS to ride over short DFS interruptions, such as a rolling restart of DNs, or a switch reload or failover, or similar. So I lean toward -1 changes that make the RS more aggressive about terminating in such situations as long as we also reason carefully about (avoiding) data loss.
          Hide
          Andrew Purtell added a comment -

          Doesn't HBASE-4222 already address this? It takes a different approach, and arguably a better one. No need to abort if a new HLog pipeline can be established.

          Show
          Andrew Purtell added a comment - Doesn't HBASE-4222 already address this? It takes a different approach, and arguably a better one. No need to abort if a new HLog pipeline can be established.

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Development