Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-3074

Multithreading issue java.lang.ArrayIndexOutOfBoundsException: -1 while working with Tez

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.5.3
    • 0.5.3
    • None
    • None
    • Important

    Description

      STEP 1. Install and configure Tez on yarn

      STEP 2. Configure hive for tez

      STEP 3. Create test tables in Hive and fill it with data

      Enable dynamic partitioning in Hive. Add to hive-site.xml and restart Hive.

      <!-- DYNAMIC PARTITION -->
      
      <property>
        <name>hive.exec.dynamic.partition</name>
        <value>true</value>
      </property>
      
      <property>
        <name>hive.exec.dynamic.partition.mode</name>
        <value>nonstrict</value>
      </property>
      
      <property>
        <name>hive.exec.max.dynamic.partitions.pernode</name>
        <value>2000</value>
      </property>
      
      <property>
        <name>hive.exec.max.dynamic.partitions</name>
        <value>2000</value>
      </property>
      

      Execute in command line

      hadoop fs -put tempsource.data /
      

      Execute in command line. Use attached file tempsource.data

      hive> CREATE TABLE test3 (x INT, y STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
      hive> CREATE TABLE ptest1 (x INT, y STRING) PARTITIONED BY (z STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
      hive> CREATE TABLE tempsource (x INT, y STRING, z STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
      hive> LOAD DATA INPATH '/tempsource.data' OVERWRITE INTO TABLE tempsource;
      hive> INSERT OVERWRITE TABLE ptest1 PARTITION (z) SELECT x,y,z FROM tempsource;
      

      STEP 4. Mount NFS on cluster

      STEP 5. Run teragen test application

      Use separate console

      /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.5.1.jar teragen -Dmapred.map.tasks=7 -Dmapreduce.map.disk=0 -Dmapreduce.map.cpu.vcores=0 1000000000 /user/hdfs/input
      

      STEP 6. Create many test files

      Use separate console

      cd /hdfs/cluster/user/hive/warehouse/ptest1/z=66
      for i in `seq 1 10000`; do dd if=/dev/urandom of=tempfile$i bs=1M count=1;
      done
      

      STEP 7. Run the following query repeatedly in other console

      Use separate console

      hive> insert overwrite table test3 select x,y from ( select x,y,z from (select x,y,z from ptest1 where x > 5 and x < 1000 union all select x,y,z from ptest1 where x > 5 and x < 1000) a)b;
      

      After some time of working it gives an exception.

      Status: Failed
      Vertex failed, vertexName=Map 3, vertexId=vertex_1443452487059_0426_1_01,
      diagnostics=[Vertex vertex_1443452487059_0426_1_01 [Map 3] killed/failed due
      to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ptest1 initializer failed,
      vertex=vertex_1443452487059_0426_1_01 [Map 3],
      java.lang.ArrayIndexOutOfBoundsException: -1
          at
      org.apache.hadoop.mapred.FileInputFormat.getBlockIndex(FileInputFormat.java:395)
          at
      org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:579)
          at
      org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:359)
          at
      org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:300)
          at
      org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:402)
          at
      org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:132)
          at
      org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
          at
      org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:422)
          at
      org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
          at
      org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
          at
      org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
          at
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          at
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          at java.lang.Thread.run(Thread.java:745)
      ]
      Vertex killed, vertexName=Map 1, vertexId=vertex_1443452487059_0426_1_00,
      diagnostics=[Vertex received Kill in INITED state., Vertex
      vertex_1443452487059_0426_1_00 [Map 1] killed/failed due to:null]
      DAG failed due to vertex failure. failedVertices:1 killedVertices:1
      FAILED: Execution Error, return code 2 from
      org.apache.hadoop.hive.ql.exec.tez.TezTask
      

      Attachments

        1. tempsource.data
          2.26 MB
          Oleksiy Sayankin

        Activity

          People

            Unassigned Unassigned
            osayankin Oleksiy Sayankin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: