Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15355

Concurrency issues during parallel moveFile due to HDFSUtils.setFullFileStatus

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.0, 2.2.0
    • Fix Version/s: 2.2.0
    • Component/s: None
    • Labels:
      None

      Description

      It is possible to run into concurrency issues during multi-threaded moveFile issued when processing queries like INSERT OVERWRITE TABLE ... SELECT .. when there are multiple files in the staging directory which is a subdirectory of the target directory. The issue is hard to reproduce but following stacktrace is one such example:

      INFO  : Loading data to table functional_text_gzip.alltypesaggmultifilesnopart from hdfs://localhost:20500/test-warehouse/alltypesaggmultifilesnopart_text_gzip/.hive-staging_hive_2016-12-01_19-58-21_712_8968735301422943318-1/-ext-10000
      ERROR : Failed with exception java.lang.ArrayIndexOutOfBoundsException
      org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException
              at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2858)
              at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3124)
              at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1701)
              at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:313)
              at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
              at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
              at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1976)
              at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1689)
              at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1421)
              at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1205)
              at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1200)
              at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:237)
              at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88)
              at org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:293)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
      Getting log thread is interrupted, since query is done!
              at org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:306)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
              at java.util.concurrent.FutureTask.run(FutureTask.java:262)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.ArrayIndexOutOfBoundsException
              at java.lang.System.arraycopy(Native Method)
              at java.util.ArrayList.removeRange(ArrayList.java:616)
              at java.util.ArrayList$SubList.removeRange(ArrayList.java:1021)
              at java.util.AbstractList.clear(AbstractList.java:234)
              at com.google.common.collect.Iterables.removeIfFromRandomAccessList(Iterables.java:213)
              at com.google.common.collect.Iterables.removeIf(Iterables.java:184)
              at org.apache.hadoop.hive.shims.Hadoop23Shims.removeBaseAclEntries(Hadoop23Shims.java:865)
              at org.apache.hadoop.hive.shims.Hadoop23Shims.setFullFileStatus(Hadoop23Shims.java:757)
              at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:2835)
              at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:2828)
              ... 4 more
      
      ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
      

      Quick online search also shows some other instances like the one mentioned in http://stackoverflow.com/questions/38900333/get-concurrentmodificationexception-in-step-2-create-intermediate-flat-hive-tab

      The issue seems to be coming from the below code :

      if (aclEnabled) {
            aclStatus =  sourceStatus.getAclStatus();
            if (aclStatus != null) {
              LOG.trace(aclStatus.toString());
              aclEntries = aclStatus.getEntries();
              removeBaseAclEntries(aclEntries);
      
              //the ACL api's also expect the tradition user/group/other permission in the form of ACL
              aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.USER, sourcePerm.getUserAction()));
              aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.GROUP, sourcePerm.getGroupAction()));
              aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.OTHER, sourcePerm.getOtherAction()));
            }
          }
      

      removeBaseAclEntries removes objects from List<AclEntry> aclEntries When HDFSUtils.setFullFileStatus() method is called from multiple threads like from https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2835 it is possible that multiple threads try to modify the List<AclEntry> aclEntries leading to concurrency issues.

      We should either move that block into a thread-safe region or call setFullFileStatus when all the threads converge.

        Attachments

        1. HIVE-15355.01.patch
          2 kB
          Vihang Karajgaonkar
        2. HIVE-15355.02.patch
          6 kB
          Vihang Karajgaonkar

          Activity

            People

            • Assignee:
              vihangk1 Vihang Karajgaonkar
              Reporter:
              vihangk1 Vihang Karajgaonkar
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: