Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-4364

TestFaultTolerance timeout on master - TestInput fix after TEZ-4338

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.10.2
    • 0.10.2
    • None
    • None

    Description

      TLDR: after TEZ-4388, setDestinationLocalhostName hit NPE as an InputReadErrorEvent was created with null destinationLocalHostName. This is unlikely in prod, we don't use the InputReadErrorEvent.create(...) with 3 parameters.

      TestFaultTolerance test becomes flakier recently. It's important to be investigated because a unit test failure could also imply a product bug while handling failure scenarios.

      According to surefire process' jstack, it can be reproduced only by TestFaultTolerance.testBasicInputFailureWithoutExitDeadline surefire_jstack.log

      "Thread-1355" #1569 prio=5 os_prio=31 tid=0x00007fe76660c800 nid=0x43d07 waiting on condition [0x000070002ab38000]
         java.lang.Thread.State: TIMED_WAITING (sleeping)
      	at java.lang.Thread.sleep(Native Method)
      	at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:155)
      	at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:142)
      	at org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:138)
      	at org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithoutExitDeadline(TestFaultTolerance.java:351)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
      

      this is when it waits for the DAG to finish

      Attachments

        1. surefire_jstack.log
          489 kB
          László Bodor
        2. syslog_attempt_1640554229092_0001_1_01_000002_0
          10 kB
          László Bodor

        Issue Links

          Activity

            People

              abstractdog László Bodor
              abstractdog László Bodor
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m