Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10576 DiskBalancer followup work items
  3. HDFS-10904

Need a new Result state for DiskBalancerWorkStatus to indicate the final Plan step errors and stuck rebalancing

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 3.0.0-alpha2
    • 2.9.0
    • balancer & mover
    • None

    Description

      • A DiskBalancer NodePlan might include a Single MoveStep or a list of MoveSteps to perform the requested disk balancing operation.
      • DiskBalancerWorkStatus tracks the current disk balancing operation status for the Plan just submitted.
      • DiskBalancerWorkStatus#Result has following states and the state machine movement for the currentResult state doesn't seem to be a driven totally from disk balancing operation. Especially, the state movement to DONE is happening only upon QueryResult, which can be improved.
          /** Various result values. **/
          public enum Result {
            NO_PLAN(0),
            PLAN_UNDER_PROGRESS(1),
            PLAN_DONE(2),
            PLAN_CANCELLED(3);
        
        DiskBalancer
        cancelPlan(String)
                this.currentResult = Result.PLAN_CANCELLED;
        DiskBalancer(String, Configuration, BlockMover)
            this.currentResult = Result.NO_PLAN;
        queryWorkStatus()
                this.currentResult = Result.PLAN_DONE;
        shutdown()
              this.currentResult = Result.NO_PLAN;
                this.currentResult = Result.PLAN_CANCELLED;
        submitPlan(String, long, String, String, boolean)
              this.currentResult = Result.PLAN_UNDER_PROGRESS;
        
      • More importantly, when the final MoveStep of the NodePlan fails, the currentResult state is stuck in PLAN_UNDER_PROGRESS forever. User querying the status will assume the operation is in progress when in reality its not making any progress. User can also run Query command with verbose option which then will display more details about the operation which includes details about errors encountered.
        • Query Output:
          Plan File:  <_file_path_>
          Plan ID: <_plan_hash_>
          Result: PLAN_UNDER_PROGRESS
          
        • "sourcePath" : "/data/disk2/hdfs/dn",
            "destPath" : "/data/disk3/hdfs/dn",
            "workItem" :
              .. .. ..
              "errorCount" : 0,
              "errMsg" : null,
              .. .. 
              "maxDiskErrors" : 5,
              .. .. ..
          
        • But, user has to decipher these details to make out that the disk balancing operation is stuck as the top level Result still says PLAN_UNDER_PROGRESS. So, we want the DiskBalancer differentiate between the in-progress operation and the stuck or final error operations.

      Attachments

        Activity

          People

            manojg Manoj Govindassamy
            manojg Manoj Govindassamy
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: