Index: src/docbkx/performance.xml =================================================================== --- src/docbkx/performance.xml (revision 1415408) +++ src/docbkx/performance.xml (working copy) @@ -303,35 +303,27 @@ Table Creation: Pre-Creating Regions -Tables in HBase are initially created with one region by default. For bulk imports, this means that all clients will write to the same region until it is large enough to split and become distributed across the cluster. A useful pattern to speed up the bulk import process is to pre-create empty regions. Be somewhat conservative in this, because too-many regions can actually degrade performance. An example of pre-creation using hex-keys is as follows (note: this example may need to be tweaked to the individual applications keys): +Tables in HBase are initially created with one region by default. For bulk imports, this means that all clients will write to the same region +until it is large enough to split and become distributed across the cluster. A useful pattern to speed up the bulk import process is to pre-create empty regions. + Be somewhat conservative in this, because too-many regions can actually degrade performance. + There are two different approaches to pre-creating splits. The first approach is to rely on the default HBaseAdmin strategy + (which is implemented in Bytes.split)... + + +byte[] startKey = ...; // your lowest keuy +byte[] endKey = ...; // your highest key +int numberOfRegions = ...; // # of regions to create +admin.createTable(table, startKey, endKey, numberOfRegions); + + And the other approach is to define the splits yourself... + + +byte[][] splits = ...; // create your own splits +admin.createTable(table, splits); + -public static boolean createTable(HBaseAdmin admin, HTableDescriptor table, byte[][] splits) -throws IOException { - try { - admin.createTable( table, splits ); - return true; - } catch (TableExistsException e) { - logger.info("table " + table.getNameAsString() + " already exists"); - // the table already exists... - return false; - } -} - -public static byte[][] getHexSplits(String startKey, String endKey, int numRegions) { - byte[][] splits = new byte[numRegions-1][]; - BigInteger lowestKey = new BigInteger(startKey, 16); - BigInteger highestKey = new BigInteger(endKey, 16); - BigInteger range = highestKey.subtract(lowestKey); - BigInteger regionIncrement = range.divide(BigInteger.valueOf(numRegions)); - lowestKey = lowestKey.add(regionIncrement); - for(int i=0; i < numRegions-1;i++) { - BigInteger key = lowestKey.add(regionIncrement.multiply(BigInteger.valueOf(i))); - byte[] b = String.format("%016x", key).getBytes(); - splits[i] = b; - } - return splits; -} + See for issues related to understanding your keyspace and pre-creating regions.
Index: src/docbkx/book.xml =================================================================== --- src/docbkx/book.xml (revision 1415408) +++ src/docbkx/book.xml (working copy) @@ -775,6 +775,34 @@ Lesson #2: While generally not advisable, using hex-keys (and more generally, displayable data) can still work with pre-split tables as long as all the created regions are accessible in the keyspace. + To conclude this example, the following is an example of how appropriate splits can be pre-created for hex-keys:. + +public static boolean createTable(HBaseAdmin admin, HTableDescriptor table, byte[][] splits) +throws IOException { + try { + admin.createTable( table, splits ); + return true; + } catch (TableExistsException e) { + logger.info("table " + table.getNameAsString() + " already exists"); + // the table already exists... + return false; + } +} + +public static byte[][] getHexSplits(String startKey, String endKey, int numRegions) { + byte[][] splits = new byte[numRegions-1][]; + BigInteger lowestKey = new BigInteger(startKey, 16); + BigInteger highestKey = new BigInteger(endKey, 16); + BigInteger range = highestKey.subtract(lowestKey); + BigInteger regionIncrement = range.divide(BigInteger.valueOf(numRegions)); + lowestKey = lowestKey.add(regionIncrement); + for(int i=0; i < numRegions-1;i++) { + BigInteger key = lowestKey.add(regionIncrement.multiply(BigInteger.valueOf(i))); + byte[] b = String.format("%016x", key).getBytes(); + splits[i] = b; + } + return splits; +}