Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-1295 Multi data center replication
  3. HBASE-2070

Collect HLogs and delete them after a period of time

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.90.0
    • None
    • None
    • Reviewed

    Description

      For replication we need to be able to service clusters that are a few hours behind in edits. For example, after distcp'ing a snapshot of the DB to another cluster, we need to make sure we get the edits that came in after the snapshot was taken.

      I plan the following changes:

      • Instead of deleting HLogs during a log roll or after a log split, move them to another folder where all logs should be aggregated.
      • Add a new configuration for how old a log can be. For a normal cluster I think of a default of 2 hours. For replication you may want to set it much higher.
      • Create a new thread in the master that checks for logs older than configured time and that deletes them.

      I also fancy having the deletion time to be configurable while the cluster is running. I'm also thinking of adding a way to tell the cluster to replay edits on itself.

      Attachments

        1. HBASE-2070.patch
          30 kB
          Jean-Daniel Cryans
        2. HBASE-2070-v2.patch
          37 kB
          Jean-Daniel Cryans
        3. HBASE-2070-v3.patch
          39 kB
          Jean-Daniel Cryans
        4. HBASE-2070-v4.patch
          39 kB
          Jean-Daniel Cryans

        Activity

          People

            jdcryans Jean-Daniel Cryans
            jdcryans Jean-Daniel Cryans
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: