Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2942

A rare flaky test for the aggregated live row count

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: test
    • Labels:
      None

      Description

      A few days ago, Adar met a rare flaky test for the live row count in TSAN mode.

       

      // code placeholder
      /home/jenkins-slave/workspace/kudu-master/3/src/kudu/integration-tests/ts_tablet_manager-itest.cc:642
            Expected: live_row_count
            Which is: 327
      To be equal to: table_info->GetMetrics()->live_row_count->value()
            Which is: 654
      

      It seems the metric value is doubled. And his full test output is in the attachment.

       

      I reviewed the previous patches and made some unusual guesses. I think one of them could explain the issue:

      When one master just becomes the leader and there are two heartbeat messages from the same tserver that are processed in parallel at Line4239, then the metric value will be doubled because the old tablet stats can be accessed concurrently.

      Thus, the question becomes how to generate two heartbeat messages from the same tserver at the same time? The possible answer is: First heartbeat message and Second heartbeat message

      Please don't forget the above case is integrate test environment, not product.

       

       

       

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              helifu LiFu He

              Dates

              • Created:
                Updated:

                Issue deployment