Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7076

TestNNBench#testNNBenchCreateReadAndDelete failing in our internal build

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      TestNNBench#testNNBenchCreateReadAndDelete failed couple of times in our internal jenkins build.

      java.lang.AssertionError: create_write should create the file
      	at org.junit.Assert.fail(Assert.java:88)
      	at org.junit.Assert.assertTrue(Assert.java:41)
      	at org.apache.hadoop.hdfs.TestNNBench.testNNBenchCreateReadAndDelete(TestNNBench.java:55)
      

      Below is my analysis for why it didn't create the file.

      NNBench.java
      // Some comments here
        public void map(Text key, 
                  LongWritable value,
                  OutputCollector<Text, Text> output,
                  Reporter reporter) throws IOException {
            if (barrier()) {
              String fileName = "file_" + value;
              if (op.equals(OP_CREATE_WRITE)) {
                startTimeTPmS = System.currentTimeMillis();
                doCreateWriteOp(fileName, reporter);
              } ...
            } else {
              output.collect(new Text("l:latemaps"), new Text("1"));
            }
        // Below are the relevant parts of barrier() method
        private boolean barrier() {
          ..
          // If the sleep time is greater than 0, then sleep and return
          ...
          LOG.info("Waiting in barrier for: " + sleepTime + " ms");
          return retVal;
        }
        // Below are the relevant parts of the doCreateWriteOp
        private void doCreateWriteOp(String name,
                                       Reporter reporter) {
          FSDataOutputStream out;
          byte[] buffer = new byte[bytesToWrite];  
          for (long l = 0l; l < numberOfFiles; l++) {
            Path filePath = new Path(new Path(baseDir, dataDirName), 
                    name + "_" + l);
          }
        ....
        }
      

      This file BASE_DIR/data/file_0_0 is getting created only if the map task starts before the time mentioned by startTime.
      Refer the chunk which I pasted above.
      map(..) --> barrier() and only if barrier() evaluates to true it will call doCreateWriteOp which will eventually create the file.
      In test case, the delay value is 3 seconds as per "-startTime", "" + (Time.now() / 1000 + 3)
      In this failing test case, I can see the task starting minimum 6 seconds after the test case started.

      2017-01-27 03:11:15,387 INFO  [Thread-4] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(289)) - Submitting tokens for job: job_local1711545156_0001
      2017-01-27 03:11:23,405 INFO  [Thread-4] mapreduce.Job (Job.java:submit(1345)) - The url to track the job: http://localhost:8080/
      

      Also when I run this test on my laptop, I see the following line being printed.

      2017-01-27 17:09:27,982 INFO  [LocalJobRunner Map Task Executor #0] hdfs.NNBench (NNBench.java:barrier(676)) - Waiting in barrier for: 1018 ms
      

      This line will be printed only in barrier() method and I don't see this line in the logs of failed test.

       
      In our environment, the jenkins server was very slow and it took more than 6 seconds to launch a map task.
      The correct fix in my opinion would be to return true in case there is no sleep in barrier() method. Only in exception, it should return false.

      Attachments

        Issue Links

          Activity

            People

              pingsutw Kevin Su
              shahrs87 Rushabh Shah
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: