Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-11045

Introducing a tool to detect flaky tests of hadoop jenkins test job

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.5.0, 2.6.0
    • Fix Version/s: 2.7.0
    • Component/s: build, tools
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      File this jira to introduce a tool to detect flaky tests of hadoop jenkins test jobs. Certainly it can be adapted to projects other than hadoop.

      I developed the tool on top of some initial work Todd Lipcon did. We find it quite useful. With Todd's agreement, I'd like to push it to upstream so all of us can share (thanks Todd for the initial work and support). I hope you find the tool useful too.

      The idea is, when one has the need to see if the test failure s/he is seeing in a pre-build jenkins run is flaky or not, s/he could run this tool to get a good idea. Also, if one wants to look at the failure trend of a testcase in a given jenkins job, the tool can be used too. I hope people find it useful.

      This tool is for hadoop contributors rather than hadoop users. Thanks Zhihong Yu for the advice to put to dev-support dir.

      Description of the tool:

      #
      # Given a jenkins test job, this script examines all runs of the job done
      # within specified period of time (number of days prior to the execution
      # time of this script), and reports all failed tests.
      #
      # The output of this script includes a section for each run that has failed
      # tests, with each failed test name listed.
      #
      # More importantly, at the end, it outputs a summary section to list all failed
      # tests within all examined runs, and indicate how many runs a same test
      # failed, and sorted all failed tests by how many runs each test failed in.
      #
      # This way, when we see failed tests in PreCommit build, we can quickly tell 
      # whether a failed test is a new failure or it failed before, and it may just 
      # be a flaky test.
      #
      # Of course, to be 100% sure about the reason of a failed test, closer look 
      # at the failed test for the specific run is necessary.
      #
      

      How to use the tool:

      Usage: determine-flaky-tests-hadoop.py [options]
      
      Options:
        -h, --help            show this help message and exit
        -J JENKINS_URL, --jenkins-url=JENKINS_URL
                              Jenkins URL
        -j JOB_NAME, --job-name=JOB_NAME
                              Job name to look at
        -n NUM_PREV_DAYS, --num-days=NUM_PREV_DAYS
                              Number of days to examine
      

      Example command line:

      ./determine-flaky-tests-hadoop.py -J https://builds.apache.org -j PreCommit-HDFS-Build -n 2 
      

        Attachments

        1. HADOOP-11045.001.patch
          7 kB
          Yongjun Zhang
        2. HADOOP-11045.002.patch
          7 kB
          Yongjun Zhang
        3. HADOOP-11045.003.patch
          8 kB
          Yongjun Zhang
        4. HADOOP-11045.004.patch
          8 kB
          Yongjun Zhang
        5. HADOOP-11045.005.patch
          7 kB
          Yongjun Zhang
        6. HADOOP-11045.006.patch
          7 kB
          Yongjun Zhang
        7. HADOOP-11045.007.patch
          7 kB
          Yongjun Zhang

          Issue Links

            Activity

              People

              • Assignee:
                yzhangal Yongjun Zhang
                Reporter:
                yzhangal Yongjun Zhang
              • Votes:
                0 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: