Hadoop Common
  1. Hadoop Common
  2. HADOOP-1149

DFS Scalability: high cpu usage in addStoredBlock

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      I have seen addStoredBlock() consume lots of CPU. One possible cause is that it invokes proccessOverReplicatedBlock. The logic to find and purge over-replicated blocks can be done much less often.

        Activity

        dhruba borthakur created issue -
        Raghu Angadi made changes -
        Field Original Value New Value
        Assignee Raghu Angadi [ rangadi ]
        Hide
        Raghu Angadi added a comment -

        Each addStoredBlock() invokes processOverReplicatedBlock(), which can be made a no-op in common case easily. Currently it does some allocation and iterations. But it does not seem more than rest of addStoredBlock(). Did you find processOverReplicatedBlock() in stack trace most of the time when you observed this?

        Let me know when you suspect this again, and we can check the trace.

        Show
        Raghu Angadi added a comment - Each addStoredBlock() invokes processOverReplicatedBlock(), which can be made a no-op in common case easily. Currently it does some allocation and iterations. But it does not seem more than rest of addStoredBlock(). Did you find processOverReplicatedBlock() in stack trace most of the time when you observed this? Let me know when you suspect this again, and we can check the trace.
        Hide
        dhruba borthakur added a comment -

        I agree with you observation. I am surprised why processOverReplicatedBlock() was called so often on one of my test runs. if you think it is worthwhile, please go ahead an make the change to make processOverReplicatedBlock() a no-op if blocks are not over-replicated.

        Show
        dhruba borthakur added a comment - I agree with you observation. I am surprised why processOverReplicatedBlock() was called so often on one of my test runs. if you think it is worthwhile, please go ahead an make the change to make processOverReplicatedBlock() a no-op if blocks are not over-replicated.
        Hide
        Raghu Angadi added a comment -

        Dhruba, attached patch calls processOverReplicatedBloc() only if numCurReplica is larger than expected number. Also has couple of minor changes.

        Please review it and may you could use this patch in your test case.
        Thanks.

        Show
        Raghu Angadi added a comment - Dhruba, attached patch calls processOverReplicatedBloc() only if numCurReplica is larger than expected number. Also has couple of minor changes. Please review it and may you could use this patch in your test case. Thanks.
        Raghu Angadi made changes -
        Attachment HADOOP-1149.patch [ 12354802 ]
        Hide
        dhruba borthakur added a comment -

        +1. Code reviewed.

        Show
        dhruba borthakur added a comment - +1. Code reviewed.
        Hide
        Raghu Angadi added a comment -

        will make patch available after HADOOP-702 and friends are checked in.

        Show
        Raghu Angadi added a comment - will make patch available after HADOOP-702 and friends are checked in.
        Raghu Angadi made changes -
        Fix Version/s 0.13.0 [ 12312348 ]
        Status Open [ 1 ] Patch Available [ 10002 ]
        Show
        Hadoop QA added a comment - +1, because http://issues.apache.org/jira/secure/attachment/12354802/HADOOP-1149.patch applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/525290 . Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
        Hide
        Tom White added a comment -

        Raghu, I think this one needs regenerating too due to FSNamesystem conflicts.

        Show
        Tom White added a comment - Raghu, I think this one needs regenerating too due to FSNamesystem conflicts.
        Tom White made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Raghu Angadi added a comment -

        hm. I tried this patch again and it applied fine to latest trunk (revision 526263). Can you try it again?

        Show
        Raghu Angadi added a comment - hm. I tried this patch again and it applied fine to latest trunk (revision 526263). Can you try it again?
        Raghu Angadi made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Show
        Hadoop QA added a comment - +1, because http://issues.apache.org/jira/secure/attachment/12354802/HADOOP-1149.patch applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/526215 . Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
        Tom White committed 526392 (2 files)
        Reviews: none

        HADOOP-1149. Improve DFS Scalability: make processOverReplicatedBlock() a no-op if blocks are not over-replicated. Contributed by Raghu Angadi.

        Hide
        Tom White added a comment -

        I've just committed this. Thanks Raghu!

        Show
        Tom White added a comment - I've just committed this. Thanks Raghu!
        Tom White made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Hadoop QA added a comment -
        Show
        Hadoop QA added a comment - Integrated in Hadoop-Nightly #50 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/50/ )
        Doug Cutting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Owen O'Malley made changes -
        Component/s dfs [ 12310710 ]

          People

          • Assignee:
            Raghu Angadi
            Reporter:
            dhruba borthakur
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development