Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7082

Fix FileInputFormat throw java.lang.ArrayIndexOutOfBoundsException(0)

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 2.7.1
    • None
    • mrv1
    • None
    • CentOS 7

      Hive 1.2.1

      Hadoop 2.7.1

    Description

      when hdfs is miss block and then MR is create split with FileInputFormat

      then will throw ArrayIndexOutOfBoundsException like this

      java.lang.ArrayIndexOutOfBoundsException: 0
      at org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:708)
      at org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:675)
      at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:365)
      at com.hadoop.mapred.DeprecatedLzoTextInputFormat.getSplits(DeprecatedLzoTextInputFormat.java:129)
      at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:305)
      at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:407)
      at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:408)
      at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:571)
      at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:363)
      at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:355)
      at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:231)
      at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
      at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
      at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
      at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
      at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
      at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
      at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
      

      part code of method getSplits(JobConf job, int numSplits) :

      if (isSplitable(fs, path)) {
        long blockSize = file.getBlockSize();
        long splitSize = computeSplitSize(goalSize, minSize, blockSize);
      
        long bytesRemaining = length;
        while (((double) bytesRemaining)/splitSize > SPLIT_SLOP) {
          String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations,
              length-bytesRemaining, splitSize, clusterMap);
          splits.add(makeSplit(path, length-bytesRemaining, splitSize,
              splitHosts[0], splitHosts[1]));
          bytesRemaining -= splitSize;
        }
      
        if (bytesRemaining != 0) {
          String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations, length
              - bytesRemaining, bytesRemaining, clusterMap);
          splits.add(makeSplit(path, length - bytesRemaining, bytesRemaining,
              splitHosts[0], splitHosts[1]));
        }
      } else {
        if (LOG.isDebugEnabled()) {
          // Log only if the file is big enough to be splitted
          if (length > Math.min(file.getBlockSize(), minSize)) {
            LOG.debug("File is not splittable so no parallelization "
                + "is possible: " + file.getPath());
          }
        }
        String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations,0,length,clusterMap);
        splits.add(makeSplit(path, 0, length, splitHosts[0], splitHosts[1]));
      }
      

      part code of method getSplitHostsAndCachedHosts(BlockLocation[] blkLocations,
      long offset, long splitSize, NetworkTopology clusterMap)

      allTopos = blkLocations[index].getTopologyPaths();
      
      // If no topology information is available, just
      // prefix a fakeRack
      if (allTopos.length == 0) {
        allTopos = fakeRacks(blkLocations, index);
      }
      
      ...
      
      return new String[][] { identifyHosts(allTopos.length, racksMap),
          new String[0]};
      

      part code of method identifyHosts(int replicationFactor, Map<Node,NodeInfo> racksMap) :

      String [] retVal = new String[replicationFactor];
      
      ...
      
      retVal[index++] = host.node.getName().split(":")[0];

      because the  blkLocations[index].getTopologyPaths() is empty and blkLocations[index].getHosts() is empty too, so replicationFactor is 0 , then execute 

      retVal[index++] = host.node.getName().split(":")[0];

      will throw ArrayIndexOutOfBoundsException(0)

      Attachments

        1. MAPREDUCE_7082.patch
          2 kB
          tartarus
        2. MAPREDUCE_7082.001.patch
          2 kB
          tartarus

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tartarus tartarus
            tartarus tartarus

            Dates

              Created:
              Updated:

              Slack

                Issue deployment