Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-2531

32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.90.0
    • Component/s: None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Changes format of region name. Adds an md5 suffix. Suffix is now the name used as directory name in filesystem.

      Description

      Kannan tripped over two regionnames that hashed the same:

      Here is code demo'ing that his two names hash the same:

      package org;
      
      import org.apache.hadoop.hbase.util.Bytes;
      import org.apache.hadoop.hbase.util.JenkinsHash;
      
      
      public class Testing {
        public static void main(final String [] args) {
          System.out.println(encodeRegionName(Bytes.toBytes("test1,6838000000,1273541236167")));
          System.out.println(encodeRegionName(Bytes.toBytes("test1,0520100000,1273541610201")));
        }
      
        /**
         * @param regionName
         * @return the encodedName
         */
        public static int encodeRegionName(final byte [] regionName) {
          return Math.abs(JenkinsHash.getInstance().hash(regionName, regionName.length, 0));
        }
      }
      

      Need new encoding mechanism. Will need to migrate old regions to new schema.

        Attachments

        1. HBASE-2531_v2.patch
          34 kB
          Kannan Muthukkaruppan

          Issue Links

            Activity

              People

              • Assignee:
                kannanm Kannan Muthukkaruppan
                Reporter:
                stack stack
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: