[MAPREDUCE-7082] Fix FileInputFormat throw java.lang.ArrayIndexOutOfBoundsException(0) - ASF JIRA

Details

Type: Bug
Status: Patch Available
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.7.1
Fix Version/s: None
Component/s: mrv1
Labels:
None
Environment:

CentOS 7

Hive 1.2.1

Hadoop 2.7.1

Description

when hdfs is miss block and then MR is create split with FileInputFormat

then will throw ArrayIndexOutOfBoundsException like this

java.lang.ArrayIndexOutOfBoundsException: 0
at org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:708)
at org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:675)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:365)
at com.hadoop.mapred.DeprecatedLzoTextInputFormat.getSplits(DeprecatedLzoTextInputFormat.java:129)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:305)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:407)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:408)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:571)
at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:363)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:355)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:231)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)

part code of method getSplits(JobConf job, int numSplits) :

if (isSplitable(fs, path)) {
  long blockSize = file.getBlockSize();
  long splitSize = computeSplitSize(goalSize, minSize, blockSize);

  long bytesRemaining = length;
  while (((double) bytesRemaining)/splitSize > SPLIT_SLOP) {
    String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations,
        length-bytesRemaining, splitSize, clusterMap);
    splits.add(makeSplit(path, length-bytesRemaining, splitSize,
        splitHosts[0], splitHosts[1]));
    bytesRemaining -= splitSize;
  }

  if (bytesRemaining != 0) {
    String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations, length
        - bytesRemaining, bytesRemaining, clusterMap);
    splits.add(makeSplit(path, length - bytesRemaining, bytesRemaining,
        splitHosts[0], splitHosts[1]));
  }
} else {
  if (LOG.isDebugEnabled()) {
    // Log only if the file is big enough to be splitted
    if (length > Math.min(file.getBlockSize(), minSize)) {
      LOG.debug("File is not splittable so no parallelization "
          + "is possible: " + file.getPath());
    }
  }
  String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations,0,length,clusterMap);
  splits.add(makeSplit(path, 0, length, splitHosts[0], splitHosts[1]));
}

part code of method getSplitHostsAndCachedHosts(BlockLocation[] blkLocations,
long offset, long splitSize, NetworkTopology clusterMap) :

allTopos = blkLocations[index].getTopologyPaths();

// If no topology information is available, just
// prefix a fakeRack
if (allTopos.length == 0) {
  allTopos = fakeRacks(blkLocations, index);
}

...

return new String[][] { identifyHosts(allTopos.length, racksMap),
    new String[0]};

part code of method identifyHosts(int replicationFactor, Map<Node,NodeInfo> racksMap) :

String [] retVal = new String[replicationFactor];

...

retVal[index++] = host.node.getName().split(":")[0];

because the blkLocations[index].getTopologyPaths() is empty and blkLocations[index].getHosts() is empty too, so replicationFactor is 0 , then execute

retVal[index++] = host.node.getName().split(":")[0];

will throw ArrayIndexOutOfBoundsException(0)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAPREDUCE_7082.001.patch
18/Apr/18 10:13
2 kB
tartarus
MAPREDUCE_7082.patch
18/Apr/18 07:23
2 kB
tartarus

Fix FileInputFormat throw java.lang.ArrayIndexOutOfBoundsException(0)

Details

Description

Attachments

Attachments

Activity

People

Dates