Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10691

FileDistribution fails in hdfs oiv command due to ArrayIndexOutOfBoundsException



    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.7.1
    • 2.8.0, 2.7.4, 3.0.0-alpha1
    • None
    • None
    • Reviewed


      I use hdfs oiv -p FileDistribution command to do a file analyse. But the ArrayIndexOutOfBoundsException happened and lead the process terminated. The stack infos:

      Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 103
      	at org.apache.hadoop.hdfs.tools.offlineImageViewer.FileDistributionCalculator.run(FileDistributionCalculator.java:243)
      	at org.apache.hadoop.hdfs.tools.offlineImageViewer.FileDistributionCalculator.visit(FileDistributionCalculator.java:176)
      	at org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.run(OfflineImageViewerPB.java:176)
      	at org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.main(OfflineImageViewerPB.java:129)

      I looked into the code and I found the exception was threw in increasing count of distribution. And the reason for the exception is that the bucket number was more than the distribution's length.

      Here are my steps:
      1).The input command params:

      hdfs oiv -p FileDistribution -maxSize 104857600 -step 1024000

      The numIntervals in code should be 104857600/1024000 =102(real value:102.4), so the distribution's length should be numIntervals + 1 = 103.
      2).The ArrayIndexOutOfBoundsException will happens when the file size is in range ((maxSize/step)*step, maxSize]. For example, if the size of one file is 104800000, and it's in range of size as mentioned before. And the bucket number is calculated as 104800000/1024000=102.3, then in code we do the Math.ceil of this, so the final value should be 103. But the distribution's length is also 103, it means the index is from 0 to 102. So the ArrayIndexOutOfBoundsException happens.

      In a word, the exception will happens when maxSize can not be divided by step and meanwhile the size of file is in range ((maxSize/step)*step, maxSize]. The related logic should be changed from

      int bucket = fileSize > maxSize ? distribution.length - 1 : (int) Math
                  .ceil((double)fileSize / steps);


      int bucket =
                  fileSize >= maxSize || fileSize > (maxSize / steps) * steps ?
                      distribution.length - 1 : (int) Math.ceil((double) fileSize / steps);


        1. HDFS-10691.001.patch
          2 kB
          Yiqun Lin
        2. HDFS-10691.002.patch
          5 kB
          Yiqun Lin
        3. HDFS-10691.003.patch
          5 kB
          Yiqun Lin
        4. HDFS-10691-branch-2.7.patch
          5 kB
          Yiqun Lin



            linyiqun Yiqun Lin
            linyiqun Yiqun Lin
            0 Vote for this issue
            5 Start watching this issue