Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-15983 Replication improperly discards data from end-of-wal in some cases.
  3. HBASE-15984

Given failure to parse a given WAL that was closed cleanly, replay the WAL.

    XMLWordPrintableJSON

Details

    • Hide
      In some particular deployments, the Replication code believes it has
      reached EOF for a WAL prior to successfully parsing all bytes known to
      exist in a cleanly closed file.

      If an EOF is detected due to parsing or other errors while there are still unparsed bytes before the end-of-file trailer, we now reset the WAL to the very beginning and attempt a clean read-through. Because we will retry these failures indefinitely, two additional changes are made to help with diagnostics:

      * On each retry attempt, a log message like the below will be emitted at the WARN level:
          
            Processing end of WAL file '{}'. At position {}, which is too far away
            from reported file length {}. Restarting WAL reading (see HBASE-15983
            for details).

      * additional metrics measure the use of this recovery mechanism. they are described in the reference guide.
      Show
      In some particular deployments, the Replication code believes it has reached EOF for a WAL prior to successfully parsing all bytes known to exist in a cleanly closed file. If an EOF is detected due to parsing or other errors while there are still unparsed bytes before the end-of-file trailer, we now reset the WAL to the very beginning and attempt a clean read-through. Because we will retry these failures indefinitely, two additional changes are made to help with diagnostics: * On each retry attempt, a log message like the below will be emitted at the WARN level:            Processing end of WAL file '{}'. At position {}, which is too far away       from reported file length {}. Restarting WAL reading (see HBASE-15983       for details). * additional metrics measure the use of this recovery mechanism. they are described in the reference guide.

    Description

      subtask for a general work around for "underlying reader failed / is in a bad state" just for the case where a WAL 1) was closed cleanly and 2) we can tell that our current offset ought not be the end of parseable entries.

      Attachments

        1. HBASE-15984.2.patch
          28 kB
          Sean Busbey
        2. HBASE-15984.1.patch
          11 kB
          Sean Busbey

        Activity

          People

            busbey Sean Busbey
            busbey Sean Busbey
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: