Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-8526

When use parallelism equals to half of the number of cpu, join and shuffle operators will easly cause deadlock.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.4.0
    • Fix Version/s: None
    • Labels:
      None
    • Environment:

      8 machines(96GB and 24 cores)  and 20 taskslot per taskmanager. twitter-2010 dataset. And parallelism setting to 80. I run my code in standalone mode. 

      Description

      The next program attached will stuck at some special parallelism in some situation. When parallelism is 80 in previous setting, The program will always stuck. And when parallelism is 100, everything goes well.  According to my research I found when the parallelism equals to number of taskslots. The program is not fastest and probably caused network buffer not enough. How networker buffer related to parallelism and  how parallelism relate to running task (In other words we have 160 taskslots but running task can be far more than taskslots). 

      Parallelism cannot be equals to half of the cpu.

      Or will casuse "java.io.FileNotFoundException". You can repeat exception on your pc and set your parallelism equals to half of your cpu core.

        Attachments

        1. T2AdjActiveV.java
          8 kB
          zhu.qing
        2. T2AdjMessage.java
          7 kB
          zhu.qing

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              skullpirate.qing zhu.qing
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 72h
                72h
                Remaining:
                Remaining Estimate - 72h
                72h
                Logged:
                Time Spent - Not Specified
                Not Specified