Uploaded image for project: 'Slider'
  1. Slider
  2. SLIDER-1161

Improve regionserver status check in HBase Slider app package



    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Slider 0.80
    • None
    • app-package
    • None
    • RHEL-6 (64 Bit)

    • Important


      PROBLEM :

      Using slider for launching Hbase containers.
      Following is the problem statement and details :
      1. Assume region server went into a big pause and lost its heartbeat with zookeeper
      2. HMaster notices this and marks the region server as DEAD
      3. However, slider agent continues to 'ps' the region server process in every heartbeat.monitor.interval (45000ms in my case) and because it is just checking for region server process being alive, it does not consider it dead
      4. After that big delay, region server finally recovers and goes to HMaster
      5. HMaster informs region server YouAreAlreadyDeadException
      6. Now, this region server brings itself down and slider also notices that process is no longer running.
      7. Slider now launches a new region server.

      The issue as clearly mentioned in steps above is that there can be a huge delay between step 4 and 6. This means that we are now operating with lesser region servers and this puts more and more load on existing region servers.

      The issue can be solved if slider would sync up with HMaster to find whether region server is alive or not. That way, it would immediately know that HMaster has already marked a region server as dead and will then bring down the region server and launch a new one.




            Unassigned Unassigned
            Sandeep Nemuri Sandeep Nemuri
            1 Vote for this issue
            3 Start watching this issue