Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10691

FileDistribution fails in hdfs oiv command due to ArrayIndexOutOfBoundsException

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.1
    • Fix Version/s: 2.8.0, 2.7.4, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      I use hdfs oiv -p FileDistribution command to do a file analyse. But the ArrayIndexOutOfBoundsException happened and lead the process terminated. The stack infos:

      Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 103
      	at org.apache.hadoop.hdfs.tools.offlineImageViewer.FileDistributionCalculator.run(FileDistributionCalculator.java:243)
      	at org.apache.hadoop.hdfs.tools.offlineImageViewer.FileDistributionCalculator.visit(FileDistributionCalculator.java:176)
      	at org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.run(OfflineImageViewerPB.java:176)
      	at org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.main(OfflineImageViewerPB.java:129)
      

      I looked into the code and I found the exception was threw in increasing count of distribution. And the reason for the exception is that the bucket number was more than the distribution's length.

      Here are my steps:
      1).The input command params:

      hdfs oiv -p FileDistribution -maxSize 104857600 -step 1024000
      

      The numIntervals in code should be 104857600/1024000 =102(real value:102.4), so the distribution's length should be numIntervals + 1 = 103.
      2).The ArrayIndexOutOfBoundsException will happens when the file size is in range ((maxSize/step)*step, maxSize]. For example, if the size of one file is 104800000, and it's in range of size as mentioned before. And the bucket number is calculated as 104800000/1024000=102.3, then in code we do the Math.ceil of this, so the final value should be 103. But the distribution's length is also 103, it means the index is from 0 to 102. So the ArrayIndexOutOfBoundsException happens.

      In a word, the exception will happens when maxSize can not be divided by step and meanwhile the size of file is in range ((maxSize/step)*step, maxSize]. The related logic should be changed from

      int bucket = fileSize > maxSize ? distribution.length - 1 : (int) Math
                  .ceil((double)fileSize / steps);
      

      to

      int bucket =
                  fileSize >= maxSize || fileSize > (maxSize / steps) * steps ?
                      distribution.length - 1 : (int) Math.ceil((double) fileSize / steps);
      

        Attachments

        1. HDFS-10691-branch-2.7.patch
          5 kB
          Yiqun Lin
        2. HDFS-10691.003.patch
          5 kB
          Yiqun Lin
        3. HDFS-10691.002.patch
          5 kB
          Yiqun Lin
        4. HDFS-10691.001.patch
          2 kB
          Yiqun Lin

          Activity

            People

            • Assignee:
              linyiqun Yiqun Lin
              Reporter:
              linyiqun Yiqun Lin
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: