Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-2362

More Improvements on NameNode Scalability

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: namenode
    • Labels:
      None

      Description

      This jira acts as an umbrella jira to track all the improvements we've done recently to improve Namenode's performance, responsiveness, and hence scalability. Those improvements include:
      1. Incremental block reports (HDFS-395)
      2. BlockManager.reportDiff optimization for processing block reports (HDFS-2477)
      3. Upgradable lock to allow simutaleous read operation while reportDiff is in progress in processing block reports (HDFS-2490)
      4. More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks (HDFS-2476)
      5. Increase granularity of write operations in ReplicationMonitor thus reducing contention for write lock (HDFS-2495)
      6. Support variable block sizes
      7. Release RPC handlers while waiting for edit log is synced to disk
      8. Reduce network traffic pressure to the master rack where NN is located by lowering read priority of the replicas on the rack
      9. A standalone KeepAlive heartbeat thread
      10. Reduce Multiple traversals of path directory to one for most namespace manipulations
      11. Move logging out of write lock section.

        Activity

        Hide
        tnykiel Tomasz Nykiel added a comment -

        I have a general question regarding the JUnit test. I observed a bizarre behaviour.
        When running some tests, they fail due to:

        Cannot lock storage /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1. The directory is already locked - so the MiniHDFSCluster cannot initialize properly.

        I observed on my local machine that sometimes, probably after running some previous tests which fail, the datanode data directory is left in some strange state:

        For instance:
        hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1 listing shows that the files inside (e.g., "current", "in_use.lock" - as far as I remember the name) are listed with "?" for all permissions, also the file owner and group are shown to be "?". I am not sure why this thing is happening, I don't think that this is an issue with any of my patches, as for example org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool.testDfsAdminDeleteBlockPool was failing due to this reason previously.

        Show
        tnykiel Tomasz Nykiel added a comment - I have a general question regarding the JUnit test. I observed a bizarre behaviour. When running some tests, they fail due to: Cannot lock storage /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1. The directory is already locked - so the MiniHDFSCluster cannot initialize properly. I observed on my local machine that sometimes, probably after running some previous tests which fail, the datanode data directory is left in some strange state: For instance: hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1 listing shows that the files inside (e.g., "current", "in_use.lock" - as far as I remember the name) are listed with "?" for all permissions, also the file owner and group are shown to be "?". I am not sure why this thing is happening, I don't think that this is an issue with any of my patches, as for example org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool.testDfsAdminDeleteBlockPool was failing due to this reason previously.
        Hide
        umamaheswararao Uma Maheswara Rao G added a comment -

        Can some please merge this improvements to 0.23 versions as well. Because this introduced a good amount of delta between trunk and 0.23 version. So, we are not able to do direct merges of some other improvements like HDFS-1765.

        Show
        umamaheswararao Uma Maheswara Rao G added a comment - Can some please merge this improvements to 0.23 versions as well. Because this introduced a good amount of delta between trunk and 0.23 version. So, we are not able to do direct merges of some other improvements like HDFS-1765 .
        Hide
        umamaheswararao Uma Maheswara Rao G added a comment -

        Recently I remember Eli's and Dhruba's discussion on mailing list about merging this NN scalability issues to 0.23.
        Are we planning it for 0.23.1 release?

        Show
        umamaheswararao Uma Maheswara Rao G added a comment - Recently I remember Eli's and Dhruba's discussion on mailing list about merging this NN scalability issues to 0.23. Are we planning it for 0.23.1 release?
        Hide
        eli2 Eli Collins added a comment -

        Not for 23.1, which is getting cut soon. We'll merge the PB changes (Jitendra has a branch for this) and BR scalability changes when 23.1 has branched.

        Show
        eli2 Eli Collins added a comment - Not for 23.1, which is getting cut soon. We'll merge the PB changes (Jitendra has a branch for this) and BR scalability changes when 23.1 has branched.
        Hide
        umamaheswararao Uma Maheswara Rao G added a comment -

        Ok, Thanks Eli.

        Show
        umamaheswararao Uma Maheswara Rao G added a comment - Ok, Thanks Eli.

          People

          • Assignee:
            Unassigned
            Reporter:
            hairong Hairong Kuang
          • Votes:
            0 Vote for this issue
            Watchers:
            22 Start watching this issue

            Dates

            • Created:
              Updated:

              Development