Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1149

DFS Scalability: high cpu usage in addStoredBlock

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      I have seen addStoredBlock() consume lots of CPU. One possible cause is that it invokes proccessOverReplicatedBlock. The logic to find and purge over-replicated blocks can be done much less often.

        Activity

        Hide
        rangadi Raghu Angadi added a comment -

        Each addStoredBlock() invokes processOverReplicatedBlock(), which can be made a no-op in common case easily. Currently it does some allocation and iterations. But it does not seem more than rest of addStoredBlock(). Did you find processOverReplicatedBlock() in stack trace most of the time when you observed this?

        Let me know when you suspect this again, and we can check the trace.

        Show
        rangadi Raghu Angadi added a comment - Each addStoredBlock() invokes processOverReplicatedBlock(), which can be made a no-op in common case easily. Currently it does some allocation and iterations. But it does not seem more than rest of addStoredBlock(). Did you find processOverReplicatedBlock() in stack trace most of the time when you observed this? Let me know when you suspect this again, and we can check the trace.
        Hide
        dhruba dhruba borthakur added a comment -

        I agree with you observation. I am surprised why processOverReplicatedBlock() was called so often on one of my test runs. if you think it is worthwhile, please go ahead an make the change to make processOverReplicatedBlock() a no-op if blocks are not over-replicated.

        Show
        dhruba dhruba borthakur added a comment - I agree with you observation. I am surprised why processOverReplicatedBlock() was called so often on one of my test runs. if you think it is worthwhile, please go ahead an make the change to make processOverReplicatedBlock() a no-op if blocks are not over-replicated.
        Hide
        rangadi Raghu Angadi added a comment -

        Dhruba, attached patch calls processOverReplicatedBloc() only if numCurReplica is larger than expected number. Also has couple of minor changes.

        Please review it and may you could use this patch in your test case.
        Thanks.

        Show
        rangadi Raghu Angadi added a comment - Dhruba, attached patch calls processOverReplicatedBloc() only if numCurReplica is larger than expected number. Also has couple of minor changes. Please review it and may you could use this patch in your test case. Thanks.
        Hide
        dhruba dhruba borthakur added a comment -

        +1. Code reviewed.

        Show
        dhruba dhruba borthakur added a comment - +1. Code reviewed.
        Hide
        rangadi Raghu Angadi added a comment -

        will make patch available after HADOOP-702 and friends are checked in.

        Show
        rangadi Raghu Angadi added a comment - will make patch available after HADOOP-702 and friends are checked in.
        Show
        hadoopqa Hadoop QA added a comment - +1, because http://issues.apache.org/jira/secure/attachment/12354802/HADOOP-1149.patch applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/525290 . Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
        Hide
        tomwhite Tom White added a comment -

        Raghu, I think this one needs regenerating too due to FSNamesystem conflicts.

        Show
        tomwhite Tom White added a comment - Raghu, I think this one needs regenerating too due to FSNamesystem conflicts.
        Hide
        rangadi Raghu Angadi added a comment -

        hm. I tried this patch again and it applied fine to latest trunk (revision 526263). Can you try it again?

        Show
        rangadi Raghu Angadi added a comment - hm. I tried this patch again and it applied fine to latest trunk (revision 526263). Can you try it again?
        Show
        hadoopqa Hadoop QA added a comment - +1, because http://issues.apache.org/jira/secure/attachment/12354802/HADOOP-1149.patch applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/526215 . Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
        Hide
        tomwhite Tom White added a comment -

        I've just committed this. Thanks Raghu!

        Show
        tomwhite Tom White added a comment - I've just committed this. Thanks Raghu!
        Hide
        hadoopqa Hadoop QA added a comment -
        Show
        hadoopqa Hadoop QA added a comment - Integrated in Hadoop-Nightly #50 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/50/ )

          People

          • Assignee:
            rangadi Raghu Angadi
            Reporter:
            dhruba dhruba borthakur
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development