Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14536

It's safe to remove synchronized from reconcile scan

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.6.0, 2.7.0
    • None
    • datanode
    • None

    Description

      I found all my my request blocked in every 6 hours and use jstack to dump threads:

      "1490900528@qtp-341796579-35" #132086969 daemon prio=5 os_prio=0 tid=0x00007f7ed01c3800 nid=0x1a1c waiting for monitor entry [0x00007f7e9e42b000]
        java.lang.Thread.State: BLOCKED (on object monitor)
          at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:295)
       - waiting to lock <0x00000002c00410f8> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
       at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getVolumeInfo(FsDatasetImpl.java:2323)
      
      /// lots of BLOCKED for about one minute
      
      "java.util.concurrent.ThreadPoolExecutor$Worker@49d8fff9[State = -1, empty queue]" #77 daemon prio=5 os_prio=0 tid=0x00007f7ed8590800 nid=0x49cc runnable [0x00007f7ea0344000]
        java.lang.Thread.State: RUNNABLE
          at org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.idToBlockDir(DatanodeUtil.java:120)
          at org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.getDir(ReplicaInfo.java:143)
          at org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.<init>(ReplicaInfo.java:95)
          at org.apache.hadoop.hdfs.server.datanode.FinalizedReplica.<init>(FinalizedReplica.java:60)
          at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getFinalizedBlocks(FsDatasetImpl.java:1510)
          - locked <0x00000002c00410f8> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
          at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:437)
          - locked <0x00000002c00410f8> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
          at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:404)
          at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:360)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
          at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          at java.lang.Thread.run(Thread.java:745)
      

      This problem may relate to lots of blocks in every datanode.
      After reviewing the DirectoryScanner behavior, I think the lock synchronized(dataset) in scan() can be removed safely.

      As dataset.getFinalizedBlocks() is a synchronized method, the memReport will be generated from volumeMap with lock. After that , the outside lock synchronized(dataset) is needless but I've encounter the later while-loop running 20 seconds that blocking other threads.

      Yes, other blocks could be added or deleted from memory or disk, but it has no affect to get the differences between them. Because the later fixing processing will handle both sides.

       

        void scan() {
          clear();
          Map<String, ScanInfo[]> diskReport = getDiskReport();
      
          // Hold FSDataset lock to prevent further changes to the block map
          synchronized(dataset) {  // remove synchronized?
            for (Entry<String, ScanInfo[]> entry : diskReport.entrySet()) {
              String bpid = entry.getKey();
              ScanInfo[] blockpoolReport = entry.getValue();
      
              Stats statsRecord = new Stats(bpid);
              stats.put(bpid, statsRecord);
              LinkedList<ScanInfo> diffRecord = new LinkedList<ScanInfo>();
              diffs.put(bpid, diffRecord);
      
              statsRecord.totalBlocks = blockpoolReport.length;
              // getFinalizedBlocks is synchronized 
              List<FinalizedReplica> bl = dataset.getFinalizedBlocks(bpid);
              FinalizedReplica[] memReport = bl.toArray(new FinalizedReplica[bl.size()]);
              Arrays.sort(memReport); // Sort based on blockId
      
              int d = 0; // index for blockpoolReport
              int m = 0; // index for memReprot
              while (m < memReport.length && d < blockpoolReport.length) {
                // may take 20s to finish this loop
              }
            } //end for
          } //end synchronized
        }

       

      Let say block A is added and block B is deleted after dataset.getFinalizedBlocks(bpid) called. That means the disk  file and memory is updated. Because we have a memory finalized blocks snapshot before, and just care about those blocks consistence. 

      If the difference report occasionally has that deleted block B, the later checkAndUpdate will handle it in the right way.

      I've compile this patch and try will test it in my enviroment.

      Attachments

        1. HDFS-14536.00.patch
          2 kB
          Sean Chow
        2. reconcile-with-synchronized.png
          60 kB
          Sean Chow
        3. reconcile-with-removing-synchronized.png
          59 kB
          Sean Chow

        Activity

          People

            Unassigned Unassigned
            seanlook Sean Chow
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: