Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-354

Data node process consumes 180% cpu

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      I did a test on DFS read throughput and found that the data node
      process consumes up to 180% cpu when it is under heavi load. Here are the details:

      The cluster has 380+ machines, each with 3GB mem and 4 cpus and 4 disks.
      I copied a 10GB file to dfs from one machine with a data node running there.
      Based on the dfs block placement policy, that machine has one replica for each block of the file.
      then I run 4 of the following commands in parellel:

      hadoop dfs -cat thefile > /dev/null &

      Since all the blocks have a local replica, all the read requests went to the local data node.
      I observed that:

      The data node process's cpu usage was around 180% for most of the time .

      The clients's cpu usage was moderate (as it should be).

      All the four disks were working concurrently with comparable read throughput.

      The total read throughput was maxed at 90MB/Sec, about 60% of the expected total
      aggregated max read throughput of 4 disks (160MB/Sec). Thus disks were not a bottleneck
      in this case.

      The data node's cpu usage seems unreasonably high.

      Attachments

        Issue Links

          Activity

            People

              cdouglas Christopher Douglas
              runping Runping Qi
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: