Hadoop Common
  1. Hadoop Common
  2. HADOOP-1149

DFS Scalability: high cpu usage in addStoredBlock

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      I have seen addStoredBlock() consume lots of CPU. One possible cause is that it invokes proccessOverReplicatedBlock. The logic to find and purge over-replicated blocks can be done much less often.

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Patch Available Patch Available Open Open
        2d 7h 11m 1 Tom White 06/Apr/07 07:34
        Open Open Patch Available Patch Available
        12d 15h 37m 2 Raghu Angadi 06/Apr/07 20:40
        Patch Available Patch Available Resolved Resolved
        12h 7m 1 Tom White 07/Apr/07 08:48
        Resolved Resolved Closed Closed
        62d 12h 52m 1 Doug Cutting 08/Jun/07 21:40
        Owen O'Malley made changes -
        Component/s dfs [ 12310710 ]
        Doug Cutting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Hadoop QA added a comment -
        Show
        Hadoop QA added a comment - Integrated in Hadoop-Nightly #50 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/50/ )
        Tom White made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Tom White added a comment -

        I've just committed this. Thanks Raghu!

        Show
        Tom White added a comment - I've just committed this. Thanks Raghu!
        Show
        Hadoop QA added a comment - +1, because http://issues.apache.org/jira/secure/attachment/12354802/HADOOP-1149.patch applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/526215 . Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
        Raghu Angadi made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Raghu Angadi added a comment -

        hm. I tried this patch again and it applied fine to latest trunk (revision 526263). Can you try it again?

        Show
        Raghu Angadi added a comment - hm. I tried this patch again and it applied fine to latest trunk (revision 526263). Can you try it again?
        Tom White made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Tom White added a comment -

        Raghu, I think this one needs regenerating too due to FSNamesystem conflicts.

        Show
        Tom White added a comment - Raghu, I think this one needs regenerating too due to FSNamesystem conflicts.
        Show
        Hadoop QA added a comment - +1, because http://issues.apache.org/jira/secure/attachment/12354802/HADOOP-1149.patch applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/525290 . Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch
        Raghu Angadi made changes -
        Fix Version/s 0.13.0 [ 12312348 ]
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Raghu Angadi added a comment -

        will make patch available after HADOOP-702 and friends are checked in.

        Show
        Raghu Angadi added a comment - will make patch available after HADOOP-702 and friends are checked in.
        Hide
        dhruba borthakur added a comment -

        +1. Code reviewed.

        Show
        dhruba borthakur added a comment - +1. Code reviewed.
        Raghu Angadi made changes -
        Attachment HADOOP-1149.patch [ 12354802 ]
        Hide
        Raghu Angadi added a comment -

        Dhruba, attached patch calls processOverReplicatedBloc() only if numCurReplica is larger than expected number. Also has couple of minor changes.

        Please review it and may you could use this patch in your test case.
        Thanks.

        Show
        Raghu Angadi added a comment - Dhruba, attached patch calls processOverReplicatedBloc() only if numCurReplica is larger than expected number. Also has couple of minor changes. Please review it and may you could use this patch in your test case. Thanks.
        Hide
        dhruba borthakur added a comment -

        I agree with you observation. I am surprised why processOverReplicatedBlock() was called so often on one of my test runs. if you think it is worthwhile, please go ahead an make the change to make processOverReplicatedBlock() a no-op if blocks are not over-replicated.

        Show
        dhruba borthakur added a comment - I agree with you observation. I am surprised why processOverReplicatedBlock() was called so often on one of my test runs. if you think it is worthwhile, please go ahead an make the change to make processOverReplicatedBlock() a no-op if blocks are not over-replicated.
        Hide
        Raghu Angadi added a comment -

        Each addStoredBlock() invokes processOverReplicatedBlock(), which can be made a no-op in common case easily. Currently it does some allocation and iterations. But it does not seem more than rest of addStoredBlock(). Did you find processOverReplicatedBlock() in stack trace most of the time when you observed this?

        Let me know when you suspect this again, and we can check the trace.

        Show
        Raghu Angadi added a comment - Each addStoredBlock() invokes processOverReplicatedBlock(), which can be made a no-op in common case easily. Currently it does some allocation and iterations. But it does not seem more than rest of addStoredBlock(). Did you find processOverReplicatedBlock() in stack trace most of the time when you observed this? Let me know when you suspect this again, and we can check the trace.
        Raghu Angadi made changes -
        Field Original Value New Value
        Assignee Raghu Angadi [ rangadi ]
        dhruba borthakur created issue -

          People

          • Assignee:
            Raghu Angadi
            Reporter:
            dhruba borthakur
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development