Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28116

Move snapshot storage from filesystem to a separated HBase table

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • snapshots
    • None

    Description

      As we know, rename and list are very expensive operations on object storage. Currently, the snapshot in hbase relies on these two operations. For example, when taking snapshot, we first write snapshot description and data manifest file to a temporary directory ,then commit it by a rename operation. When list all snapshots, we will scan the snapshot directory to find all completed snapshots.

      So maybe we can try to introduce a new snapshot storage, using hbase table to store it.
      Here are a few points from which maybe we can gain benefits:
      1. make hbase easier to deploy on object storage, like s3
      2. will make snapshots faster and more lightweight. In the current filesystem-based snapshot implementation, when consolidating snapshot manifest, we will first list all region manifests with a thread pool, read content and then delete them. When the number of regions is large, this process may take a lot of time. In comparison, the read and write operations of hbase tables are more lightweight than the read and write operations of hdfs files.
      3. more likely to reduce hdfs small files

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            frostruan ruanhui

            Dates

              Created:
              Updated:

              Slack

                Issue deployment