Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-1754

use TCP keepalives

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 0.20.0, 0.90.0
    • None
    • None

    Description

      If a regionserver crashes while the client is engaged in IPC with it at a vulnerable point in the TCP FSM (ESTABLISHED, no outstanding data to send), the IPC will be stuck waiting "forever" (> 12 hours, etc.). This hoses the client, especially if it is trying to look up a region in META. Worse, it is not possible to restart the regionserver if the hung client is colocated with it on the same host, because the OS will consider port 60020 bound and in use, unless the client is forcibly killed. Killing some types of applications – especially long running processes which can't redo work from a checkpoint but must start over from the beginning – can be very painful. Investigate if TCP keepalives can be enabled at the IPC level.

      Attachments

        1. HBASE-1754.patch
          3 kB
          Andrew Kyle Purtell

        Activity

          People

            apurtell Andrew Kyle Purtell
            apurtell Andrew Kyle Purtell
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: