Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3452

Support creating three-replicas table or partition when only 2 tservers healthy

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.18.0
    • None
    • None

    Description

      Background

      In my case, every day a new Kudu table (called: history_data_table) will be created to store history data and a new partition for another table (called: business_data_table) to be ready to store today's data. These tables and partitions all require 3 replicas. This business logic was implemented by some Python scripts. My Kudu cluster contains 3 masters and 3 tservers. Flag: --catalog_manager_check_ts_count_for_create_table is false.

      Sometimes, one tserver maybe become unavailable. Table creating task will retry continuously and always fail until the tserver become healthy again. See the error:

      E0222 11:10:32.767140 3321 catalog_manager.cc:672] Error processing pending assignments: Invalid argument: error selecting replicas for tablet 41dffa9783f14f36a5b6c35e89075c1a, state:0: Not enough tablet servers are online for table 'test_table'. Need at least 3 replicas, but only 2 tablet servers are available

      As there are no enough replicas, a tablet will never be created. The state of this tablet is not running. Therefore, read or write this tablet will fail even if there are 2 tservers can be used to create 2 replicas.

       

      An already created tablet can still be on service even if one of its 3 replicas become unavailable. Why can not create a three-replicas table when only 2 tservers healthy?

       

      Besides, a validate table creating task will be affected by another invalidate tasks. In the upper example, a table creating task with RF=1 will still not succeed even if there exists more than one alive tablet servers. Because the background task manager will break the whole process when finds a tablet creating task failed and begin a new process to try to execute all tasks.

       

       

      Design

      A new flag: --support_create_tablet_without_enough_healthy_tservers is added. The original logic keeps the same. When this flag is set true, a three-replicas tablet can be created successfully and its status is losing one replica. This tablet can be be read and write normally.

       

      There are 3 things need to do:

      1. A tool to cancel the table creating task.
      2. A tool to show the running table creating task.
      3. A method to create table without enough healthy tservers.
      4. make invalidate table creating task not affected by other invalidate tasks.

      Attachments

        Activity

          People

            Unassigned Unassigned
            wangxixu Xixu Wang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: