Description
STEP 1. Install and configure Tez on yarn
STEP 2. Configure hive for tez
STEP 3. Create test tables in Hive and fill it with data
Enable dynamic partitioning in Hive. Add to hive-site.xml and restart Hive.
<!-- DYNAMIC PARTITION --> <property> <name>hive.exec.dynamic.partition</name> <value>true</value> </property> <property> <name>hive.exec.dynamic.partition.mode</name> <value>nonstrict</value> </property> <property> <name>hive.exec.max.dynamic.partitions.pernode</name> <value>2000</value> </property> <property> <name>hive.exec.max.dynamic.partitions</name> <value>2000</value> </property>
Execute in command line
hadoop fs -put tempsource.data /
Execute in command line. Use attached file tempsource.data
hive> CREATE TABLE test3 (x INT, y STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; hive> CREATE TABLE ptest1 (x INT, y STRING) PARTITIONED BY (z STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; hive> CREATE TABLE tempsource (x INT, y STRING, z STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; hive> LOAD DATA INPATH '/tempsource.data' OVERWRITE INTO TABLE tempsource; hive> INSERT OVERWRITE TABLE ptest1 PARTITION (z) SELECT x,y,z FROM tempsource;
STEP 4. Mount NFS on cluster
STEP 5. Run teragen test application
Use separate console
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.5.1.jar teragen -Dmapred.map.tasks=7 -Dmapreduce.map.disk=0 -Dmapreduce.map.cpu.vcores=0 1000000000 /user/hdfs/input
STEP 6. Create many test files
Use separate console
cd /hdfs/cluster/user/hive/warehouse/ptest1/z=66 for i in `seq 1 10000`; do dd if=/dev/urandom of=tempfile$i bs=1M count=1; done
STEP 7. Run the following query repeatedly in other console
Use separate console
hive> insert overwrite table test3 select x,y from ( select x,y,z from (select x,y,z from ptest1 where x > 5 and x < 1000 union all select x,y,z from ptest1 where x > 5 and x < 1000) a)b;
After some time of working it gives an exception.
Status: Failed Vertex failed, vertexName=Map 3, vertexId=vertex_1443452487059_0426_1_01, diagnostics=[Vertex vertex_1443452487059_0426_1_01 [Map 3] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ptest1 initializer failed, vertex=vertex_1443452487059_0426_1_01 [Map 3], java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.hadoop.mapred.FileInputFormat.getBlockIndex(FileInputFormat.java:395) at org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:579) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:132) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ] Vertex killed, vertexName=Map 1, vertexId=vertex_1443452487059_0426_1_00, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1443452487059_0426_1_00 [Map 1] killed/failed due to:null] DAG failed due to vertex failure. failedVertices:1 killedVertices:1 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask