Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-14843

Streaming bucketing end-to-end test can fail with Output hash mismatch

    XMLWordPrintableJSON

Details

    Description

      Description
      Streaming bucketing end-to-end test (test_streaming_bucketing.sh) can fail with Output hash mismatch.

      Number of running task managers has reached 4.
      Job (e0b7a86e4d4111f3947baa3d004e083a) is running.
      Waiting until all values have been produced
      Truncating buckets
      Number of produced values 26930/60000
      Truncating buckets
      Number of produced values 30890/60000
      Truncating buckets
      Number of produced values 37340/60000
      Truncating buckets
      Number of produced values 41290/60000
      Truncating buckets
      Number of produced values 46710/60000
      Truncating buckets
      Number of produced values 52120/60000
      Truncating buckets
      Number of produced values 57110/60000
      Truncating buckets
      Number of produced values 62530/60000
      Cancelling job e0b7a86e4d4111f3947baa3d004e083a.
      Cancelled job e0b7a86e4d4111f3947baa3d004e083a.
      Waiting for job (e0b7a86e4d4111f3947baa3d004e083a) to reach terminal state CANCELED ...
      Job (e0b7a86e4d4111f3947baa3d004e083a) reached terminal state CANCELED
      Job e0b7a86e4d4111f3947baa3d004e083a was cancelled, time to verify
      FAIL Bucketing Sink: Output hash mismatch.  Got 9e00429abfb30eea4f459eb812b470ad, expected 01aba5ff77a0ef5e5cf6a727c248bdc3.
      head hexdump of actual:
      0000000   (   2   ,   1   0   ,   0   ,   S   o   m   e       p   a   y
      0000010   l   o   a   d   .   .   .   )  \n   (   2   ,   1   0   ,   1
      0000020   ,   S   o   m   e       p   a   y   l   o   a   d   .   .   .
      0000030   )  \n   (   2   ,   1   0   ,   2   ,   S   o   m   e       p
      0000040   a   y   l   o   a   d   .   .   .   )  \n   (   2   ,   1   0
      0000050   ,   3   ,   S   o   m   e       p   a   y   l   o   a   d   .
      0000060   .   .   )  \n   (   2   ,   1   0   ,   4   ,   S   o   m   e
      0000070       p   a   y   l   o   a   d   .   .   .   )  \n   (   2   ,
      0000080   1   0   ,   5   ,   S   o   m   e       p   a   y   l   o   a
      0000090   d   .   .   .   )  \n   (   2   ,   1   0   ,   6   ,   S   o
      00000a0   m   e       p   a   y   l   o   a   d   .   .   .   )  \n   (
      00000b0   2   ,   1   0   ,   7   ,   S   o   m   e       p   a   y   l
      00000c0   o   a   d   .   .   .   )  \n   (   2   ,   1   0   ,   8   ,
      00000d0   S   o   m   e       p   a   y   l   o   a   d   .   .   .   )
      00000e0  \n   (   2   ,   1   0   ,   9   ,   S   o   m   e       p   a
      00000f0   y   l   o   a   d   .   .   .   )  \n                        
      00000fa
      Stopping taskexecutor daemon (pid: 55164) on host gyao-desktop.
      Stopping standalonesession daemon (pid: 51073) on host gyao-desktop.
      Stopping taskexecutor daemon (pid: 51504) on host gyao-desktop.
      Skipping taskexecutor daemon (pid: 52034), because it is not running anymore on gyao-desktop.
      Skipping taskexecutor daemon (pid: 52472), because it is not running anymore on gyao-desktop.
      Skipping taskexecutor daemon (pid: 52916), because it is not running anymore on gyao-desktop.
      Stopping taskexecutor daemon (pid: 54121) on host gyao-desktop.
      Stopping taskexecutor daemon (pid: 54726) on host gyao-desktop.
      [FAIL] Test script contains errors.
      Checking of logs skipped.
      
      [FAIL] 'flink-end-to-end-tests/test-scripts/test_streaming_bucketing.sh' failed after 2 minutes and 3 seconds! Test exited with exit code 1
      

      How to reproduce
      Comment out the delay of 10s after the 1st TM is restarted to provoke the issue:

      echo "Restarting 1 TM"
      $FLINK_DIR/bin/taskmanager.sh start
      wait_for_number_of_running_tms 4
      
      #sleep 10
      
      echo "Killing 2 TMs"
      kill_random_taskmanager
      kill_random_taskmanager
      wait_for_number_of_running_tms 2
      

      Command to run the test:

      FLINK_DIR=build-target/ flink-end-to-end-tests/run-single-test.sh skip flink-end-to-end-tests/test-scripts/test_streaming_bucketing.sh
      

      Attachments

        1. complete_result
          1.70 MB
          Gary Yao
        2. flink-gary-standalonesession-0-gyao-desktop.log
          242 kB
          Gary Yao
        3. flink-gary-taskexecutor-0-gyao-desktop.log
          240 kB
          Gary Yao
        4. flink-gary-taskexecutor-1-gyao-desktop.log
          64 kB
          Gary Yao
        5. flink-gary-taskexecutor-2-gyao-desktop.log
          54 kB
          Gary Yao
        6. flink-gary-taskexecutor-3-gyao-desktop.log
          67 kB
          Gary Yao
        7. flink-gary-taskexecutor-4-gyao-desktop.log
          103 kB
          Gary Yao
        8. flink-gary-taskexecutor-5-gyao-desktop.log
          91 kB
          Gary Yao
        9. flink-gary-taskexecutor-6-gyao-desktop.log
          101 kB
          Gary Yao

        Issue Links

          Activity

            People

              banmoy PengFei Li
              gjy Gary Yao
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m