Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-2531

32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 0.90.0
    • None
    • Reviewed
    • Changes format of region name. Adds an md5 suffix. Suffix is now the name used as directory name in filesystem.

    Description

      Kannan tripped over two regionnames that hashed the same:

      Here is code demo'ing that his two names hash the same:

      package org;
      
      import org.apache.hadoop.hbase.util.Bytes;
      import org.apache.hadoop.hbase.util.JenkinsHash;
      
      
      public class Testing {
        public static void main(final String [] args) {
          System.out.println(encodeRegionName(Bytes.toBytes("test1,6838000000,1273541236167")));
          System.out.println(encodeRegionName(Bytes.toBytes("test1,0520100000,1273541610201")));
        }
      
        /**
         * @param regionName
         * @return the encodedName
         */
        public static int encodeRegionName(final byte [] regionName) {
          return Math.abs(JenkinsHash.getInstance().hash(regionName, regionName.length, 0));
        }
      }
      

      Need new encoding mechanism. Will need to migrate old regions to new schema.

      Attachments

        1. HBASE-2531_v2.patch
          34 kB
          Kannan Muthukkaruppan

        Issue Links

          Activity

            People

              kannanm Kannan Muthukkaruppan
              stack Michael Stack
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: