[TEZ-3074] Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while working with Tez - ASF JIRA

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.5.3
Fix Version/s: 0.5.3
Component/s: None
Labels:
None

Target Version/s:

0.5.3
Flags:

Important

Description

STEP 1. Install and configure Tez on yarn

STEP 2. Configure hive for tez

STEP 3. Create test tables in Hive and fill it with data

Enable dynamic partitioning in Hive. Add to hive-site.xml and restart Hive.

<!-- DYNAMIC PARTITION -->

<property>
  <name>hive.exec.dynamic.partition</name>
  <value>true</value>
</property>

<property>
  <name>hive.exec.dynamic.partition.mode</name>
  <value>nonstrict</value>
</property>

<property>
  <name>hive.exec.max.dynamic.partitions.pernode</name>
  <value>2000</value>
</property>

<property>
  <name>hive.exec.max.dynamic.partitions</name>
  <value>2000</value>
</property>

Execute in command line

hadoop fs -put tempsource.data /

Execute in command line. Use attached file tempsource.data

hive> CREATE TABLE test3 (x INT, y STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
hive> CREATE TABLE ptest1 (x INT, y STRING) PARTITIONED BY (z STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
hive> CREATE TABLE tempsource (x INT, y STRING, z STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
hive> LOAD DATA INPATH '/tempsource.data' OVERWRITE INTO TABLE tempsource;
hive> INSERT OVERWRITE TABLE ptest1 PARTITION (z) SELECT x,y,z FROM tempsource;

STEP 4. Mount NFS on cluster

STEP 5. Run teragen test application

Use separate console

/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.5.1.jar teragen -Dmapred.map.tasks=7 -Dmapreduce.map.disk=0 -Dmapreduce.map.cpu.vcores=0 1000000000 /user/hdfs/input

STEP 6. Create many test files

Use separate console

cd /hdfs/cluster/user/hive/warehouse/ptest1/z=66
for i in `seq 1 10000`; do dd if=/dev/urandom of=tempfile$i bs=1M count=1;
done

STEP 7. Run the following query repeatedly in other console

Use separate console

hive> insert overwrite table test3 select x,y from ( select x,y,z from (select x,y,z from ptest1 where x > 5 and x < 1000 union all select x,y,z from ptest1 where x > 5 and x < 1000) a)b;

After some time of working it gives an exception.

Status: Failed
Vertex failed, vertexName=Map 3, vertexId=vertex_1443452487059_0426_1_01,
diagnostics=[Vertex vertex_1443452487059_0426_1_01 [Map 3] killed/failed due
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ptest1 initializer failed,
vertex=vertex_1443452487059_0426_1_01 [Map 3],
java.lang.ArrayIndexOutOfBoundsException: -1
    at
org.apache.hadoop.mapred.FileInputFormat.getBlockIndex(FileInputFormat.java:395)
    at
org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:579)
    at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359)
    at
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
    at
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
    at
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:132)
    at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
    at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
    at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
    at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
]
Vertex killed, vertexName=Map 1, vertexId=vertex_1443452487059_0426_1_00,
diagnostics=[Vertex received Kill in INITED state., Vertex
vertex_1443452487059_0426_1_00 [Map 1] killed/failed due to:null]
DAG failed due to vertex failure. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.tez.TezTask

Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while working with Tez

Details

Description

Attachments

Attachments

Activity

People

Dates