HBase
  1. HBase
  2. HBASE-2531

32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.90.0
    • Component/s: None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Changes format of region name. Adds an md5 suffix. Suffix is now the name used as directory name in filesystem.

      Description

      Kannan tripped over two regionnames that hashed the same:

      Here is code demo'ing that his two names hash the same:

      package org;
      
      import org.apache.hadoop.hbase.util.Bytes;
      import org.apache.hadoop.hbase.util.JenkinsHash;
      
      
      public class Testing {
        public static void main(final String [] args) {
          System.out.println(encodeRegionName(Bytes.toBytes("test1,6838000000,1273541236167")));
          System.out.println(encodeRegionName(Bytes.toBytes("test1,0520100000,1273541610201")));
        }
      
        /**
         * @param regionName
         * @return the encodedName
         */
        public static int encodeRegionName(final byte [] regionName) {
          return Math.abs(JenkinsHash.getInstance().hash(regionName, regionName.length, 0));
        }
      }
      

      Need new encoding mechanism. Will need to migrate old regions to new schema.

      1. HBASE-2531_v2.patch
        34 kB
        Kannan Muthukkaruppan

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Kannan Muthukkaruppan
              Reporter:
              stack
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development