Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-8128

Deduplicate the ops in RDBBatchOperation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 1.4.0
    • db

    Description

      In a multipart upload test, the key "testKey" had 1000-parts with 8KB each. The same key was uploaded 10 times sequentially (i.e. it overwrote the previous upload) in a newly formatted cluster. The replication was 3, so the total raw size of the key is ~ 24 MB. After the test has completed, OM rocks db uses ~ 7.5 GB.

      In this JIRA, we add a cache to RDBBatchOperation for deduplication. Within a batch, the put-ops and delete-ops of the same key can be safely deduplicated. Only the last op has to be applied to the db. All the previous ops can be discarded.

      Attachments

        Issue Links

          Activity

            People

              szetszwo Tsz-wo Sze
              szetszwo Tsz-wo Sze
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: