Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-23340

hmaster /hbase/replication/rs session expired (hbase replication default value is true, we don't use ) causes logcleaner can not clean oldWALs, which resulits in oldWALs too large (more than 2TB)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0-alpha-1, 2.2.3
    • 3.0.0-alpha-1, 2.5.0
    • master
    • None
    • Hide
      Previously the LogCleaner chores had their own ZK client. If they encounter Session expired error, the LogCleaner chore will never succeed again despite the HMaster continuing to run. With this change, the LogCleaner chores now share the underlying ZK of the HMaster (similar to HFileCleaner chores). So now, if an unrecoverable session expiration occurs, the hmaster will abort and cleaner chores will not be left as zombies.
      Show
      Previously the LogCleaner chores had their own ZK client. If they encounter Session expired error, the LogCleaner chore will never succeed again despite the HMaster continuing to run. With this change, the LogCleaner chores now share the underlying ZK of the HMaster (similar to HFileCleaner chores). So now, if an unrecoverable session expiration occurs, the hmaster will abort and cleaner chores will not be left as zombies.

    Description

      hmaster /hbase/replication/rs session expired (hbase replication default value is true, we don't use ) causes logcleaner can not clean oldWALs, which resulits in oldWALs too large (more than 2TB).

       

       

      we can solve it by following :

      1) increase the session timeout(but i think it is not a good idea. because we do not know how long to set is suitable)

      2) close the hbase replication. It is not a good idea too, when our user uses this feature

      3) we need add retry times, for example when it has already happened three times, we set the ReplicationLogCleaner and SnapShotCleaner stop

      that is all my ideas, i do not konw it is suitable, If it is suitable, could i commit a PR?

      Does anynode have a good idea.

      Attachments

        1. Snipaste_2019-11-21_10-39-25.png
          111 kB
          Jacky Lau
        2. Snipaste_2019-11-21_14-10-36.png
          133 kB
          Jacky Lau

        Activity

          People

            Bo Cui Bo Cui
            jackylau Jacky Lau
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: