Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-6608

FsDatasetCache: hard-coded 4096 value in test is not appropriate for all HW

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0-alpha1
    • None
    • test
    • None
    • PPC64 (LE & BE, OpenJDK & IBM JVM, Ubuntu, RHEL 7 & RHEL 6.5)

    Description

      The value 4096 is hard-coded in HDFS code (product and tests).
      It appears 171 times, including 8 times in product (not tests) code:
      hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs : 163
      hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs : 4
      hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http : 3
      hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/wsrs : 1

      This value deals with different subjects: files, block size, page size, etc.
      4096 (as block size and page size) is appropriate for many systems, but not for PPC64, for which it is 65536.

      Looking at HDFS product (not test) code, it seems (no 100% sure) that the code is OK (not using hard-coded page/block size). However someone should check this in depth.

      his.maxBytes = dataset.datanode.getDnConf().getMaxLockedMemory();

      However, at test level, the value 4096 is used in many places and it is very hard to understand if it depends on the HW architecture or not.

      About test TestFsDatasetCache#testPageRounder, the HW value is sometimes got from the system :
      private static final long PAGE_SIZE = NativeIO.POSIX.getCacheManipulator().getOperatingSystemPageSize();
      private static final long BLOCK_SIZE = PAGE_SIZE;
      but there are several places where 4096 is used whenever it should depend on the HW value.

      conf.setLong(DFSConfigKeys.DFS_DATANODE_MAX_LOCKED_MEMORY_KEY, CACHE_CAPACITY);
      With:
      // Most Linux installs allow a default of 64KB locked memory
      private static final long CACHE_CAPACITY = 64 * 1024
      However, for PPC64, this value should be much bigger.

      This TestFsDatasetCache#testPageRounder test is aimed to cache 5 pages of size 512. However, the page size is 65536 on PPC64 and 4064 on x86_64. Thus, the method in charge of reserving blocks in the HDFS cache will by 4096 bytes steps on x86_64 and 65536 bytes steps on PPC64 , whith a hard-coded limit : maxBytes = 65536 bytes

      5 * 4096 = 20480 : OK
      5 * 65536 = 327680 : KO : the test ends by TimeOut since the limit is overpassed at the very beginning and the test is still waiting.

      As a conclusion, there are several issues to fix:

      • instead of using many hard-coded values 4096, the (test mainly) code should use Java constants built by using HW values (like : NativeIO.POSIX.getCacheManipulator().getOperatingSystemPageSize() )
      • several constants must be used since 4096 deals with different subjects, included some that do not depend on the HW
      • the test must be improved for handling cases where the limit is over-passed at the very beginning

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              trex58 Tony Reix
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: