Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20828

Finish-up AMv2 Design/List of Tenets/Specification of operation

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • amv2
    • None

    Description

      AMv2 is missing specification. There are too many grey-areas still. Also missing are a concise listing of the tenets of AMv2 operation. Here are some examples:

      • HBASE-19529 "Handle null states in AM": Asks how we should treat null state in hbase:meta. What does it 'mean'. We seem to treat it differently dependent on context. Needs clarification. Apache9 recently asked similar about the meaning of OFFLINE.
      • Logging needs to have a particular form to help trace Procedure progress; needs a write-up.

      Lets fill in items to address in this umbrella issue. Can address in subissues and produce specification doc too. We have the below but these are mostly (incomplete) description for devs on pv2 and amv2; the specification is missing:

      http://hbase.apache.org/book.html#pv2
      http://hbase.apache.org/book.html#amv2

      (Other areas include addressing what is up w/ rollback – when, how much, and when it is not appropriate – as well as recommendation on Procedures coarseness, locking – is it ok to lock table in alter table procedure for the life of the procedure? – and so on).

      Attachments

        1.
        Handle null states in AM Sub-task Resolved Unassigned
        2.
        Restore procedure locks when master restarts Sub-task Resolved Duo Zhang
        3.
        The parent procedure of RegionTransitionProcedure may not have the table lock Sub-task Resolved Duo Zhang
        4.
        Merged region's RIT state may not be cleaned after master restart Sub-task Resolved Allan Yang
        5.
        RS was killed due to master thought the region should be on a already dead server Sub-task Closed Allan Yang
        6.
        Data loss if merging regions while ServerCrashProcedure executing Sub-task Resolved Allan Yang
        7.
        Introduce a region transition procedure to handle all the state transition for a region Sub-task Resolved Duo Zhang
        8.
        RS may get killed while master restarts Sub-task Resolved Allan Yang
        9.
        Data loss if splitting region while ServerCrashProcedure executing Sub-task Resolved Allan Yang
        10.
        Merge the update procedure store on locking with the general persist after a procedure execution Sub-task Open Unassigned
        11.
        Possible NPE in ReopenTableRegionsProcedure Sub-task Resolved Allan Yang
        12.
        There will be race when we call suspendIfNotReady and then throw ProcedureSuspendedException Sub-task Resolved Duo Zhang
        13.
        Split/Merge table can be executed concurrently with DisableTableProcedure Sub-task Resolved Unassigned
        14.
        ArrayIndexOutOfBoundsException when rolling back procedure Sub-task Resolved Duo Zhang
        15.
        Lock may not be taken or released while rolling back procedure Sub-task Resolved Allan Yang
        16.
        SCP can be scheduled multiple times for the same RS Sub-task Resolved Allan Yang
        17.
        One operation in procedure batch throws an exception will cause all RegionTransitionProcedures receive the same exception Sub-task Resolved Allan Yang
        18.
        Find another way to test the backoff mechanism in TRSP Sub-task Resolved Unassigned
        19.
        Revisit the expected states for open/close Sub-task Resolved Duo Zhang
        20.
        Add cache for TableStateManager Sub-task Resolved Duo Zhang
        21.
        Meta Table should be able to online even if all procedures are lost Sub-task Resolved Allan Yang
        22.
        Procedure worker should not quit when getting unexpected error Sub-task Patch Available Allan Yang
        23.
        Exclusive lock may be held by a SUCCESS state procedure forever Sub-task Resolved Allan Yang
        24.
        Possible NPE if ModifyTable and region split happen at the same time Sub-task Resolved Allan Yang
        25.
        Confirm that we can (rolling) upgrade from 2.0.x and 2.1.x to 2.2.x after HBASE-20881 Sub-task Resolved Duo Zhang
        26.
        Reimplement assign/unassign related procedure metrics Sub-task Resolved Duo Zhang
        27.
        Introduce a mechanism to bypass the execution of a stuck procedure Sub-task Resolved Allan Yang
        28.
        Adding getter methods to some private fields in ProcedureV2 module Sub-task Resolved Allan Yang
        29.
        Move TestCreateTableProcedure.testMRegions to a separated file Sub-task Resolved Duo Zhang
        30.
        Remove the explicit timeout config for TestTruncateTableProcedure Sub-task Resolved Duo Zhang
        31.
        The timeout retry logic for several procedures are broken after master restarts Sub-task Resolved Duo Zhang
        32.
        ReopenTableRegionsProcedure sometimes hangs Sub-task Resolved Unassigned
        33.
        Reimplement the retry backoff logic for ReopenTableRegionsProcedure Sub-task Resolved Duo Zhang
        34.
        Race in region opening and load balancing can cause region stuck in RIT Sub-task Resolved Duo Zhang
        35.
        Revisit the executeProcedure method for open/close region Sub-task Resolved Duo Zhang
        36.
        Revisit the close region related code at RS side Sub-task Open Unassigned
        37.
        Implement exponential retrying backoff for Assign/UnassignRegionHandler introduced in HBASE-21217 Sub-task Resolved Duo Zhang
        38.
        Allow the procedure implementation to skip persistence of the state after a execution Sub-task Resolved Duo Zhang
        39.
        Rename the closed procedure wal files so that we do not need to call recoverLease when restarting Sub-task Resolved Unassigned
        40.
        Load procedure wals with multiple threads Sub-task Resolved Unassigned
        41.
        Use CompatRemoteProcedureResolver to dispatch open/close region requests to RS Sub-task Resolved Allan Yang
        42.
        Skip persistence when retrying for assignment related procedures Sub-task Resolved Duo Zhang
        43.
        Add jitter for ProcedureUtil.getBackoffTimeMs Sub-task Resolved Yi Mei
        44.
        Refactor WALProcedureStore and add more comments for better understanding the implementation Sub-task Resolved Duo Zhang
        45.
        Need to find a way to limit the number of proc wal files Sub-task Resolved Duo Zhang
        46.
        Add more tests for procedure store related classes Sub-task Resolved Unassigned
        47.
        HostingServer in UnassignProcedure is not accurate Sub-task Resolved Allan Yang
        48.
        Do not rollback successful sub procedures when rolling back a procedure Sub-task Resolved Duo Zhang
        49.
        TransitRegionStateProcedure should not fail with FAILED_OPEN when acting as a sub procedure Sub-task Open Unassigned
        50.
        The implementation of BitSetNode is not efficient Sub-task Resolved Duo Zhang
        51.
        The getActiveMinProcId and getActiveMaxProcId of BitSetNode are incorrect if there are no active procedure Sub-task Resolved Duo Zhang
        52.
        Backport HBASE-21278 to branch-2.1 and branch-2.0 Sub-task Resolved Michael Stack
        53.
        Should not skip force updating for a sub procedure even if it has been finished Sub-task Resolved Duo Zhang
        54.
        ReopenTableRegionsProcedure will enter an infinite loop if we schedule a TRSP at the same time Sub-task Resolved Duo Zhang
        55.
        Simplify the implementation of WALProcedureMap Sub-task Resolved Duo Zhang
        56.
        The force update thread may have race with PE worker when the procedure is rolling back Sub-task Resolved Duo Zhang
        57.
        Polish the rollback implementation in ProcedureExecutor Sub-task Open Duo Zhang
        58.
        Procedure may be deleted improperly during master restarts resulting in 'Corrupt' Sub-task Resolved Allan Yang
        59.
        Rewrite the buildingHoldCleanupTracker method in WALProcedureStore Sub-task Resolved Duo Zhang
        60.
        Procedure holds the lock should put to front of the queue after restart Sub-task Resolved Allan Yang
        61.
        Revisit the lock and queue implementation in MasterProcedureScheduler Sub-task Resolved Duo Zhang
        62.
        Add some verbose log to MasterProcedureScheduler Sub-task Resolved Duo Zhang
        63.
        Add debug log for procedure stack id related operations Sub-task Resolved Duo Zhang
        64.
        Procedure with holdlock=false should not be restored lock when restarts Sub-task Resolved Allan Yang
        65.
        Abort split/merge procedure if there is a table procedure of the same table going on Sub-task Resolved Allan Yang
        66.
        Do not kill RS if reportOnlineRegions fails Sub-task Resolved Allan Yang
        67.
        Procedures for meta table/region should be able to execute in separate workers Sub-task Resolved Allan Yang
        68.
        The checkOnlineRegionsReport can accidentally complete a TRSP Sub-task Resolved Duo Zhang
        69.
        Retry on reportRegionStateTransition can lead to unexpected errors Sub-task Resolved Duo Zhang
        70.
        Add more comments about how we do fencing for WALProcedureStore Sub-task Resolved Unassigned
        71.
        Should not persist the dispatched field for RegionRemoteProcedureBase Sub-task Resolved Duo Zhang
        72.
        Backport "HBASE-21463 The checkOnlineRegionsReport can accidentally complete a TRSP" to branch-2.1 and branch-2.0 Sub-task Resolved Unassigned
        73.
        WALProcedure may remove proc wal files still with active procedures Sub-task Resolved Duo Zhang
        74.
        Ignore the reportRegionStateTransition call from a dead server Sub-task Resolved Duo Zhang

        Activity

          People

            Unassigned Unassigned
            stack Michael Stack
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated: