Uploaded image for project: 'Geode'
  1. Geode
  2. GEODE-10409

Rebalance Model Missing Collocated Regions At Server Startup

    XMLWordPrintableJSON

Details

    Description

      Following steps reproduce the issue:

      Run the start.gfsh in the attached example, which configures a geode system with a partitioned region, a gateway sender and a collocated region with the partitioned region. So there are three regions totally, the leader region, the collcated region and the queue region.

      Then run the example code, which will source ~400M data and 5 times amount of events into the system.

      Then stop one of the server, and revoke the disk file of the server.

      Then start the server, which will trigger a bucket recovery.

      From the attached log line596, line598 and line5958, we can see that the queue region is not included in the rebalance model, either in the data size colum nor in the max size colum.

      Then do a manual rebalance after the server is up, this time log shows the queue region is added to the model.(line6010, line6012, lin6014 and line6028)

       

      The inconsistent behavior will lead to 2 negative results:

      1) Different result of rebalance between server startup phase and manual trigger, startup rebalance tells everything is OK, rebalance finished, but manual trigger rebalance tells space not enough since it included the queue region into the model which has 5 times data size as the leader region.

      2) A dismatch between the rebalance model and the actual data being rebalanced(Actually the queue region data is rebalanced although the region is not included in the model at server startup phase).

      Attachments

        1. test.tar.gz
          6 kB
          Weijie Xu
        2. server2.log
          984 kB
          Weijie Xu

        Issue Links

          Activity

            People

              WeijieEST Weijie Xu
              WeijieEST Weijie Xu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: