Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-6755

UT for Job module 4 is stuck for 4 hours

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • tests-ci
    • None

    Description

      Of late, we are incurring timeouts for job module 4 in azure for UT module. 

      Its very strange from the logs. There are only 15 tests triggered and all 15 of them are complete. And the test is just stuck in there. after 3 hour 40 mins or so, entire job run is cancelled and we don't see any logs in between. 

       

      Excerpt from logs:

      2023-08-23T08:16:30.2299637Z [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 s - in org.apache.hudi.hadoop.TestAnnotation
      2023-08-23T08:16:30.2319408Z [INFO] Running org.apache.hudi.hadoop.hive.TestHoodieCombineHiveInputFormat
      2023-08-23T08:16:30.2421972Z Formatting using clusterid: testClusterID
      2023-08-23T08:16:30.2570394Z 31120 [Listener at localhost/41109] WARN  org.apache.hadoop.conf.Configuration [] - No unit for dfs.heartbeat.interval(3) assuming SECONDS
      2023-08-23T08:16:30.2574429Z 31121 [Listener at localhost/41109] WARN  org.apache.hadoop.conf.Configuration [] - No unit for dfs.namenode.safemode.extension(0) assuming MILLISECONDS
      2023-08-23T08:16:30.5588638Z 31422 [Listener at localhost/41109] WARN  org.apache.hadoop.conf.Configuration [] - No unit for dfs.heartbeat.interval(3) assuming SECONDS
      2023-08-23T08:16:30.5589693Z 31422 [Listener at localhost/41109] WARN  org.apache.hadoop.conf.Configuration [] - No unit for dfs.namenode.safemode.extension(0) assuming MILLISECONDS
      2023-08-23T08:16:30.6633485Z 31526 [Listener at localhost/33189] WARN  org.apache.hadoop.conf.Configuration [] - No unit for dfs.datanode.outliers.report.interval(1800000) assuming MILLISECONDS
      2023-08-23T08:16:34.4630382Z 35326 [Listener at localhost/41177] WARN  org.apache.hadoop.hdfs.server.datanode.DirectoryScanner [] - DirectoryScanner: shutdown has been called
      2023-08-23T08:16:34.5845964Z 35447 [BP-118616970-10.1.148.0-1692778590260 heartbeating to localhost/127.0.0.1:33189] WARN  org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager [] - IncrementalBlockReportManager interrupted
      2023-08-23T08:16:34.5895718Z 35447 [BP-118616970-10.1.148.0-1692778590260 heartbeating to localhost/127.0.0.1:33189] WARN  org.apache.hadoop.hdfs.server.datanode.DataNode [] - Ending block pool service for: Block pool BP-118616970-10.1.148.0-1692778590260 (Datanode Uuid eaf09d45-eb56-49ca-bec3-2da8a8f7bd14) service to localhost/127.0.0.1:33189
      2023-08-23T08:16:34.5952216Z 35458 [refreshUsed-/tmp/hdfs-test-service16927785902315510072575960611105/data/data2/current/BP-118616970-10.1.148.0-1692778590260] WARN  org.apache.hadoop.fs.CachingGetSpaceUsed [] - Thread Interrupted waiting to refresh disk information: sleep interrupted
      2023-08-23T08:16:34.5971425Z 35458 [refreshUsed-/tmp/hdfs-test-service16927785902315510072575960611105/data/data1/current/BP-118616970-10.1.148.0-1692778590260] WARN  org.apache.hadoop.fs.CachingGetSpaceUsed [] - Thread Interrupted waiting to refresh disk information: sleep interrupted
      2023-08-23T08:16:34.6255412Z 35488 [2085456234@qtp-936129838-1 - Acceptor0 HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:45383] WARN  org.apache.hadoop.http.HttpServer2 [] - HttpServer Acceptor: isRunning is false. Rechecking.
      2023-08-23T08:16:34.6289510Z 35492 [2085456234@qtp-936129838-1 - Acceptor0 HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:45383] WARN  org.apache.hadoop.http.HttpServer2 [] - HttpServer Acceptor: isRunning is false
      2023-08-23T08:16:34.7378002Z [WARNING] Tests run: 4, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 4.503 s - in org.apache.hudi.hadoop.hive.TestHoodieCombineHiveInputFormat
      2023-08-23T12:00:25.4660529Z ##[error]The operation was canceled.
      2023-08-23T12:00:25.4674798Z ##[section]Finishing: UT other modules 

      Just check for log timing for last 3 lines. 

       

      Ref:  https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_apis/build/builds/19423/logs/21

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            shivnarayan sivabalan narayanan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: