Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-3658

Stop to persist container related pipeline info of each key into OM DB to reduce DB size

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.1.0
    • None

    Description

      An investigation result of serilized key size, RATIS with three replica. Following examples are quoted from the output of the "ozone sh key info" command which doesn't show related pipeline information for each key location element.

      1. empty key, serilized size 113 bytes
      hadoop/bucket/user/root/terasort/10G-input-7/_SUCCESS
      {
      "volumeName" : "hadoop",
      "bucketName" : "bucket",
      "name" : "user/root/terasort/10G-input-7/_SUCCESS",
      "dataSize" : 0,
      "creationTime" : "2019-11-21T13:53:11.330Z",
      "modificationTime" : "2019-11-21T13:53:11.361Z",
      "replicationType" : "RATIS",
      "replicationFactor" : 3,
      "ozoneKeyLocations" : [ ],
      "metadata" : { },
      "fileEncryptionInfo" : null
      }

      2. key with one chunk data, serilized size 661 bytes
      hadoop/bucket/user/root/terasort/10G-input-6/part-m-00037
      {
      "volumeName" : "hadoop",
      "bucketName" : "bucket",
      "name" : "user/root/terasort/10G-input-6/part-m-00037",
      "dataSize" : 223696200,
      "creationTime" : "2019-11-18T07:47:58.254Z",
      "modificationTime" : "2019-11-18T07:53:52.066Z",
      "replicationType" : "RATIS",
      "replicationFactor" : 3,
      "ozoneKeyLocations" : [

      { "containerID" : 7, "localID" : 103157811003588713, "length" : 223696200, "offset" : 0 }

      ],
      "metadata" : { },
      "fileEncryptionInfo" : null
      }

      3. key with two chunk data, serilized size 1205 bytes,
      ozone sh key info hadoop/bucket/user/root/terasort/10G-input-7/part-m-00027
      {
      "volumeName" : "hadoop",
      "bucketName" : "bucket",
      "name" : "user/root/terasort/10G-input-7/part-m-00027",
      "dataSize" : 223696200,
      "creationTime" : "2019-11-21T13:47:07.653Z",
      "modificationTime" : "2019-11-21T13:53:07.964Z",
      "replicationType" : "RATIS",
      "replicationFactor" : 3,
      "ozoneKeyLocations" : [

      { "containerID" : 221, "localID" : 103176210196201501, "length" : 134217728, "offset" : 0 }

      ,

      { "containerID" : 222, "localID" : 103176231767375926, "length" : 89478472, "offset" : 0 }

      ],
      "metadata" : { },
      "fileEncryptionInfo" : null
      }

      When client reads a key, there is "refreshPipeline" option to control whether to get the up-to-date container location infofrom SCM.
      Currently, this option is always set to true, which makes saved container location info in OM DB useless.

      Another motivation is when using Nanda's tool for the OM performance test, with 1000 millions(1Billion) keys, each key with 1 replica, 2 chunk meta data, the total rocks DB directory size is 65.5GB. One of our customer cluster has the requirement to save 10 Billion objects. In this case ,the DB size is approximately (65.5GB * 10 * /2 * 3 )~ 1TB.

      The goal of this task is going to discard the container location info when persist key to OM DB to save the DB space.

      Attachments

        Issue Links

          Activity

            People

              Sammi Sammi Chen
              Sammi Sammi Chen
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: