Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-5276

FileSystem.Statistics got performance issue on multi-thread read/write.

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.4-alpha
    • Fix Version/s: 2.3.0
    • Component/s: None
    • Labels:
      None
    • Target Version/s:

      Description

      FileSystem.Statistics is a singleton variable for each FS scheme, each read/write on HDFS would lead to a AutomicLong.getAndAdd(). AutomicLong does not perform well in multi-threads(let's say more than 30 threads). so it may cause serious performance issue. during our spark test profile, 32 threads read data from HDFS, about 70% cpu time is spent on FileSystem.Statistics.incrementBytesRead().

        Attachments

        1. ThreadLocalStat.patch
          3 kB
          Binglin Chang
        2. TestFileSystemStatistics.java
          2 kB
          Binglin Chang
        3. jstack-trace.PNG
          118 kB
          Chengxiang Li
        4. hdfs-test.PNG
          122 kB
          Chengxiang Li
        5. HDFSStatisticTest.java
          2 kB
          Chengxiang Li
        6. HDFS-5276.003.patch
          15 kB
          Colin McCabe
        7. HDFS-5276.002.patch
          14 kB
          Colin McCabe
        8. HDFS-5276.001.patch
          14 kB
          Colin McCabe
        9. DisableFSReadWriteBytesStat.patch
          3 kB
          Binglin Chang

        Issue Links

          Activity

            People

            • Assignee:
              cmccabe Colin McCabe
              Reporter:
              chengxiang li Chengxiang Li

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment