Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-22168

proc WALs with non-corrupted-but-"corrupted" procedures block WAL archiving forever

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 3.0.0-alpha-1
    • None
    • None
    • None

    Description

      I've reported the bug before where we get these messages when loading proc WAL

      2019-04-04 14:43:00,424 ERROR [master/...:becomeActiveMaster] wal.WALProcedureTree: Missing stack id 43459, max stack id is 43460, root procedure is Procedure(pid=43645, ppid=-1, class=org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure)
      

      resulting in

      2019-04-04 14:43:16,176 ERROR [...:17000:becomeActiveMaster] procedure2.ProcedureExecutor: Corrupt pid=43645, state=WAITING:SERVER_CRASH_FINISH, hasLock=false; ServerCrashProcedure server=..., splitWal=true, meta=false
      

      There is no actual corruption in the file, so it never gets moved to corrupted files.
      However, there's no accounting for these kind of procedures in the tracker as far as I can tell (I didn't spend a lot of time looking at the code though) so as a result we get 100s of proc wals that are stuck forever because of some ancient file with these WALs; that causes master startup to take a long time.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sershe Sergey Shelukhin
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: