Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-12107

long running apps may have a huge number of StatisticsData instances under FileSystem



    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 2.7.0
    • 2.8.0, 2.7.3, 2.6.4, 3.0.0-alpha1
    • fs
    • None


      We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from FileSystem$Statistics$StatisticsData (in the allData list of Statistics).

      Although the thread reference from StatisticsData is a weak reference, and thus can get cleared once a thread goes away, the actual StatisticsData instances in the list won't get cleared until any of these following methods is called on Statistics:

      • getBytesRead()
      • getBytesWritten()
      • getReadOps()
      • getLargeReadOps()
      • getWriteOps()
      • toString()

      It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the Statistics. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly.

      The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with FileSystem$Statistics itself in that the memory is controlled only as a side effect of those operations.


        1. HADOOP-12107.001.patch
          7 kB
          Sangjin Lee
        2. HADOOP-12107.002.patch
          11 kB
          Sangjin Lee
        3. HADOOP-12107.003.patch
          11 kB
          Sangjin Lee
        4. HADOOP-12107.004.patch
          11 kB
          Sangjin Lee
        5. HADOOP-12107.005.patch
          11 kB
          Sangjin Lee

        Issue Links



              sjlee0 Sangjin Lee
              sjlee0 Sangjin Lee
              0 Vote for this issue
              16 Start watching this issue