Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-17085

commit log was switched from non-daemon to daemon threads, which causes the JVM to exit in some case as no non-daemon threads are active

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 4.1
    • Test/dtest/python
    • None

    Description

      Right now bootstrap tests are failing every time we run, this work is to debug and fix the underling issue.

      Examples:

      https://app.circleci.com/pipelines/github/dcapwell/cassandra/1062/workflows/ba3e6395-ef22-4724-8424-0549e65d8cff/jobs/7089

      >       node3.nodetool('bootstrap resume')
      
      bootstrap_test.py:1014: 
      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
      ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:1005: in nodetool
          return handle_external_tool_process(p, ['nodetool', '-h', 'localhost', '-p', str(self.jmx_port)] + shlex.split(cmd))
      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
      
      process = <subprocess.Popen object at 0x7fb071a03940>
      cmd_args = ['nodetool', '-h', 'localhost', '-p', '7300', 'bootstrap', ...]
      
          def handle_external_tool_process(process, cmd_args):
              out, err = process.communicate()
              if (out is not None) and isinstance(out, bytes):
                  out = out.decode()
              if (err is not None) and isinstance(err, bytes):
                  err = err.decode()
              rc = process.returncode
          
              if rc != 0:
      >           raise ToolError(cmd_args, rc, out, err)
      E           ccmlib.node.ToolError: Subprocess ['nodetool', '-h', 'localhost', '-p', '7300', 'bootstrap', 'resume'] exited with non-zero status; exit status: 1; 
      E           stderr: nodetool: Failed to connect to 'localhost:7300' - EOFException: 'null'.
      
      ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:2305: ToolError
      

      https://app.circleci.com/pipelines/github/dcapwell/cassandra/1062/workflows/ba3e6395-ef22-4724-8424-0549e65d8cff/jobs/7087

      >       node1.start()
      
      bootstrap_test.py:483: 
      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
      ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:895: in start
          node.watch_log_for_alive(self, from_mark=mark)
      ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:664: in watch_log_for_alive
          self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, filename=filename)
      ../env3.6/lib/python3.6/site-packages/ccmlib/node.py:592: in watch_log_for
          head=reads[:50], tail="..."+reads[len(reads)-150:]))
      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
      
      start = 1635453190.3118386, timeout = 120
      msg = "Missing: ['127.0.0.1:7000.* is now UP'] not found in system.log:\n Head: \n Tail: ..."
      node = 'node3'
      
          @staticmethod
          def raise_if_passed(start, timeout, msg, node=None):
              if start + timeout < time.time():
      >           raise TimeoutError.create(start, timeout, msg, node)
      E           ccmlib.node.TimeoutError: 28 Oct 2021 20:35:10 [node3] after 120.12/120 seconds Missing: ['127.0.0.1:7000.* is now UP'] not found in system.log:
      E            Head: 
      E            Tail: ...
      

      Attachments

        Issue Links

          Activity

            People

              dcapwell David Capwell
              dcapwell David Capwell
              David Capwell, Sam Tunnicliffe
              Sam Tunnicliffe
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10m
                  10m