Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-5276

FileSystem.Statistics got performance issue on multi-thread read/write.

VotersStop watchingWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.4-alpha
    • 2.3.0
    • None
    • None

    Description

      FileSystem.Statistics is a singleton variable for each FS scheme, each read/write on HDFS would lead to a AutomicLong.getAndAdd(). AutomicLong does not perform well in multi-threads(let's say more than 30 threads). so it may cause serious performance issue. during our spark test profile, 32 threads read data from HDFS, about 70% cpu time is spent on FileSystem.Statistics.incrementBytesRead().

      Attachments

        1. DisableFSReadWriteBytesStat.patch
          3 kB
          Binglin Chang
        2. HDFS-5276.001.patch
          14 kB
          Colin McCabe
        3. HDFS-5276.002.patch
          14 kB
          Colin McCabe
        4. HDFS-5276.003.patch
          15 kB
          Colin McCabe
        5. HDFSStatisticTest.java
          2 kB
          Chengxiang Li
        6. hdfs-test.PNG
          122 kB
          Chengxiang Li
        7. jstack-trace.PNG
          118 kB
          Chengxiang Li
        8. TestFileSystemStatistics.java
          2 kB
          Binglin Chang
        9. ThreadLocalStat.patch
          3 kB
          Binglin Chang

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            cmccabe Colin McCabe
            chengxiang li Chengxiang Li
            Votes:
            0 Vote for this issue
            Watchers:
            9 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment