Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-13594

Use an ExecutorService for repair commands instead of new Thread(..).start()

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 4.0
    • Component/s: None
    • Labels:
      None

      Description

      Currently when starting a new repair, we create a new Thread and start it immediately

      It would be nice to be able to 1) limit the number of threads and 2) reject starting new repair commands if we are already running too many.

      1. 13594.png
        168 kB
        Marcus Eriksson

        Activity

        Hide
        jkni Joel Knighton added a comment -

        No problem - fix committed as 256a74faa31fcf25bdae753c563fa2c69f7f355c. Thanks!

        Show
        jkni Joel Knighton added a comment - No problem - fix committed as 256a74faa31fcf25bdae753c563fa2c69f7f355c . Thanks!
        Hide
        krummas Marcus Eriksson added a comment -

        +1, sorry about that, again...

        Show
        krummas Marcus Eriksson added a comment - +1, sorry about that, again...
        Hide
        jkni Joel Knighton added a comment -

        This causes a test failure in DatabaseDescriptorRefTest because of the new Config class - I've pushed a fix here.

        Show
        jkni Joel Knighton added a comment - This causes a test failure in DatabaseDescriptorRefTest because of the new Config class - I've pushed a fix here .
        Hide
        krummas Marcus Eriksson added a comment -

        Committed, thanks

        Show
        krummas Marcus Eriksson added a comment - Committed, thanks
        Hide
        aweisberg Ariel Weisberg added a comment -

        Unrelated but I did manage to reproduce short_read_test failing after letting it run a lot of times.

        Show
        aweisberg Ariel Weisberg added a comment - Unrelated but I did manage to reproduce short_read_test failing after letting it run a lot of times.
        Show
        krummas Marcus Eriksson added a comment - https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/170/ ( 13594.png )
        Hide
        aweisberg Ariel Weisberg added a comment - - edited

        I've noticed that short_read test seems to time out quite a bit. I'll run it locally a bit and see if it reproduces. I think I did before with no luck. This issue might only appear in Apache Jenkins.

        Show
        aweisberg Ariel Weisberg added a comment - - edited I've noticed that short_read test seems to time out quite a bit. I'll run it locally a bit and see if it reproduces. I think I did before with no luck. This issue might only appear in Apache Jenkins.
        Hide
        krummas Marcus Eriksson added a comment -

        Still trying to get dtests to run successfully, they seem quite broken right now

        Show
        krummas Marcus Eriksson added a comment - Still trying to get dtests to run successfully, they seem quite broken right now
        Hide
        aweisberg Ariel Weisberg added a comment - - edited

        The dtest you started has almost certainly been aged out. https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/145/

        I think we need to set up archiving of information from Apache Jenkins since they are throwing away everything after a mere 10 builds.

        Show
        aweisberg Ariel Weisberg added a comment - - edited The dtest you started has almost certainly been aged out. https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/145/ I think we need to set up archiving of information from Apache Jenkins since they are throwing away everything after a mere 10 builds.
        Hide
        krummas Marcus Eriksson added a comment -

        Yes, it is

        Since we are reusing threads in the ExecutorService we need to clear out the tracing state between runs, I have pushed a commit that does that and restarted the dtests.

        Show
        krummas Marcus Eriksson added a comment - Yes, it is Since we are reusing threads in the ExecutorService we need to clear out the tracing state between runs, I have pushed a commit that does that and restarted the dtests.
        Hide
        aweisberg Ariel Weisberg added a comment -

        Maybe a legit failure?
        thread_count_repair_test (repair_tests.repair_test.TestRepair) ... Build timed out (after 20 minutes). Marking the build as aborted.

        Show
        aweisberg Ariel Weisberg added a comment - Maybe a legit failure? thread_count_repair_test (repair_tests.repair_test.TestRepair) ... Build timed out (after 20 minutes). Marking the build as aborted.
        Show
        krummas Marcus Eriksson added a comment - https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/119/
        Hide
        aweisberg Ariel Weisberg added a comment -

        I think it's pretty unlikely there is a test dependency on repair command concurrency, but but maybe run the dtests just to be safe?

        The code itself looks good.

        Show
        aweisberg Ariel Weisberg added a comment - I think it's pretty unlikely there is a test dependency on repair command concurrency, but but maybe run the dtests just to be safe? The code itself looks good.
        Hide
        krummas Marcus Eriksson added a comment - - edited

        https://github.com/krummas/cassandra/commits/marcuse/limit_repair_command_threads

        introduces 2 new config variables:

        • repair_command_pool_size - how many repair commands can we run at the same time
        • repair_command_pool_full_strategy what do we do when the pool is full (queue or reject)
        Show
        krummas Marcus Eriksson added a comment - - edited https://github.com/krummas/cassandra/commits/marcuse/limit_repair_command_threads introduces 2 new config variables: repair_command_pool_size - how many repair commands can we run at the same time repair_command_pool_full_strategy what do we do when the pool is full ( queue or reject )

          People

          • Assignee:
            krummas Marcus Eriksson
            Reporter:
            krummas Marcus Eriksson
            Reviewer:
            Ariel Weisberg
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development