Uploaded image for project: 'Hadoop Distributed Data Store'
  1. Hadoop Distributed Data Store
  2. HDDS-3459

Datanode use a single thread to process the command of scm

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Ozone Datanode
    • Labels:
      None
    • Target Version/s:

      Description

      What's the problem ?
      Datanode use a single command-processor-thread to process the command of scm. Sometimes the thread maybe blocked a very long time, then all the command from scm can not be processed, it maybe cause some problem.

      For example, a group has leader, follower1, follower2, Steps to reproduce the problem are as following:
      1. Some datanodes crash, and follower2 begin streaming container data to other datanode, then the command-processor-thread was blocked at cont.writeLock() when it try to delete block, because streaming container data need to hold the RwLock of container.
      2. follower2 report close pipeline
      3. scm send close pipeline command
      4. leader and follower1 remove group, but follower2 can not remove group because the command-processor-thread was blocked.
      5. follower2 then begin LeaderElection about 12 hours until streaming container data finish and release the RwLock, leader and follower1 response group not found.

      You can see find it in following screenshot.
      1. follower2 begin streaming container data from 2020-04-17 23:38:39


      2. follower2 report close pipeline 2020-01-48 01:14:39

      3. scm send close pipeline command

      4. leader remove group

      follower1 remove group

      5. follower2 then begin LeaderElection about 12 hours until 2020-04-18 13:06:20.

        Attachments

        1. screenshot-1.png
          123 kB
          runzhiwang
        2. screenshot-2.png
          26 kB
          runzhiwang
        3. screenshot-3.png
          102 kB
          runzhiwang
        4. screenshot-4.png
          77 kB
          runzhiwang
        5. screenshot-5.png
          21 kB
          runzhiwang
        6. screenshot-6.png
          21 kB
          runzhiwang
        7. screenshot-7.png
          45 kB
          runzhiwang

          Activity

            People

            • Assignee:
              yjxxtd runzhiwang
              Reporter:
              yjxxtd runzhiwang
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated: