Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14304

High lock contention on hdfsHashMutex in libhdfs

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.3.0
    • hdfs-client, libhdfs, native
    • None
    • Reviewed

    Description

      While doing some performance profiling of an application using libhdfs, we noticed a high amount of lock contention on the hdfsHashMutex defined in hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h

      The issue is that every JNI method invocation done by hdfs.c goes through a helper method called invokeMethod. invokeMethod calls globalClassReference which acquires hdfsHashMutex while performing a lookup in a htable (a custom hash table that lives in libhdfs/common) (the lock is acquired for both reads and writes). The hash table maps char *className to jclass objects, it seems the goal of the hash table is to avoid repeatedly creating jclass objects for each JNI call.

      For multi-threaded applications, this lock severely limits that rate at which Java methods can be invoked. pstacks show a lot of time being spent on hdfsHashMutex

      #0  0x00007fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0
      #1  0x00007fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0
      #2  0x00007fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
      #3  0x00000000027d8386 in mutexLock ()
      #4  0x00000000027d0e7b in globalClassReference ()
      #5  0x00000000027d1160 in invokeMethod ()
      #6  0x00000000027d4176 in readDirect ()
      #7  0x00000000027d4325 in hdfsRead ()
      

      Same with perf report

      +   63.36%     0.01%  [k] system_call_fastpath
      +   61.60%     0.12%  [k] sys_futex 
      +   61.45%     0.13%  [k] do_futex 
      +   57.54%     0.49%  [k] _raw_qspin_lock
      +   57.07%     0.01%  [k] queued_spin_lock_slowpath
      +   55.47%    55.47%  [k] native_queued_spin_lock_slowpath
      -   35.68%     0.00%  [k] 0x6f6f6461682f6568
         - 0x6f6f6461682f6568 
            - 30.55% __lll_lock_wait       
               - 29.40% system_call_fastpath      
                  - 29.39% sys_futex      
                     - 29.35% do_futex   
                        - 29.27% futex_wait     
                           - 28.17% futex_wait_setup
                              - 27.05% _raw_qspin_lock 
                                 - 27.05% queued_spin_lock_slowpath
                                      26.30% native_queued_spin_lock_slowpath 
                                    + 0.67% ret_from_intr 
                           + 0.71% futex_wait_queue_me
            - 2.00% methodIdFromClass
               - 1.94% jni_GetMethodID  
                  - 1.71% get_method_id   
                       0.96% SymbolTable::lookup_only 
            - 1.61% invokeMethod
               - 0.62% jni_CallLongMethodV 
                    0.52% jni_invoke_nonstatic 
              0.75% pthread_mutex_lock
      

      Attachments

        Issue Links

          Activity

            People

              stakiar Sahil Takiar
              stakiar Sahil Takiar
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: