Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.1.0, 2.2.0
-
None
-
None
Description
It is possible to run into concurrency issues during multi-threaded moveFile issued when processing queries like INSERT OVERWRITE TABLE ... SELECT .. when there are multiple files in the staging directory which is a subdirectory of the target directory. The issue is hard to reproduce but following stacktrace is one such example:
INFO : Loading data to table functional_text_gzip.alltypesaggmultifilesnopart from hdfs://localhost:20500/test-warehouse/alltypesaggmultifilesnopart_text_gzip/.hive-staging_hive_2016-12-01_19-58-21_712_8968735301422943318-1/-ext-10000 ERROR : Failed with exception java.lang.ArrayIndexOutOfBoundsException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2858) at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3124) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1701) at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:313) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1976) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1689) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1421) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1205) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1200) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:237) at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88) at org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:293) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796) Getting log thread is interrupted, since query is done! at org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:306) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at java.util.ArrayList.removeRange(ArrayList.java:616) at java.util.ArrayList$SubList.removeRange(ArrayList.java:1021) at java.util.AbstractList.clear(AbstractList.java:234) at com.google.common.collect.Iterables.removeIfFromRandomAccessList(Iterables.java:213) at com.google.common.collect.Iterables.removeIf(Iterables.java:184) at org.apache.hadoop.hive.shims.Hadoop23Shims.removeBaseAclEntries(Hadoop23Shims.java:865) at org.apache.hadoop.hive.shims.Hadoop23Shims.setFullFileStatus(Hadoop23Shims.java:757) at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:2835) at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:2828) ... 4 more ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
Quick online search also shows some other instances like the one mentioned in http://stackoverflow.com/questions/38900333/get-concurrentmodificationexception-in-step-2-create-intermediate-flat-hive-tab
The issue seems to be coming from the below code :
if (aclEnabled) { aclStatus = sourceStatus.getAclStatus(); if (aclStatus != null) { LOG.trace(aclStatus.toString()); aclEntries = aclStatus.getEntries(); removeBaseAclEntries(aclEntries); //the ACL api's also expect the tradition user/group/other permission in the form of ACL aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.USER, sourcePerm.getUserAction())); aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.GROUP, sourcePerm.getGroupAction())); aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.OTHER, sourcePerm.getOtherAction())); } }
removeBaseAclEntries removes objects from List<AclEntry> aclEntries When HDFSUtils.setFullFileStatus() method is called from multiple threads like from https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2835 it is possible that multiple threads try to modify the List<AclEntry> aclEntries leading to concurrency issues.
We should either move that block into a thread-safe region or call setFullFileStatus when all the threads converge.