Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7240 Scaling HDFS
  3. HDFS-12506

Ozone: ListBucket is too slow

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      Generated 3 million keys in ozone, and run listBucket command to get a list of buckets under a volume,

      bin/hdfs oz -listBucket http://15oz1.fyre.ibm.com:9864/vol-0-15143 -user wwei
      

      this call spent over 15 seconds to finish. The problem was caused by the inflexible structure of KSM DB. Right now ksm.db stores keys like following

      /v1/b1
      /v1/b1/k1
      /v1/b1/k2
      /v1/b1/k3
      /v1/b2
      /v1/b2/k1
      /v1/b2/k2
      /v1/b2/k3
      /v1/b3
      /v1/b4
      

      keys are sorted in nature order so when we do list buckets under a volume e.g /v1, we need to seek to /v1 point and start to iterate and filter keys, this ends up with scanning all keys under volume /v1. The problem with this design is we don't have an efficient approach to locate all buckets without scanning the keys.

      Attachments

        1. HDFS-12506-HDFS-7240.001.patch
          5 kB
          Weiwei Yang
        2. HDFS-12506-HDFS-7240.002.patch
          7 kB
          Weiwei Yang
        3. HDFS-12506-HDFS-7240.003.patch
          11 kB
          Weiwei Yang
        4. HDFS-12506-HDFS-7240.004.patch
          16 kB
          Weiwei Yang
        5. HDFS-12506-HDFS-7240.005.patch
          17 kB
          Weiwei Yang
        6. HDFS-12506-HDFS-7240.006.patch
          26 kB
          Weiwei Yang
        7. HDFS-12506-HDFS-7240.007.patch
          26 kB
          Weiwei Yang

        Activity

          People

            cheersyang Weiwei Yang
            cheersyang Weiwei Yang
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: