Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-13245

Network stack is leaking files

    XMLWordPrintableJSON

Details

    Description

      There's file leak in the network stack / shuffle service.

      When running the SlotCountExceedingParallelismTest on Windows a large number of .channel files continue to reside in a flink-netty-shuffle-XXX directory.

      From what I've gathered so far these files are still being used by a BoundedBlockingSubpartition. The cleanup logic in this class uses ref-counting to ensure we don't release data while a reader is still present. However, at the end of the job this count has not reached 0, and thus nothing is being released.

      The same issue is also present on the ResultPartition level; the ReleaseOnConsumptionResultPartition also are being released while the ref-count is greater than 0.

      Overall it appears like there's some issue with the notifications for partitions being consumed.

      It is feasible that this issue has recently caused issues on Travis where the build were failing due to a lack of disk space.

      Attachments

        Issue Links

          Activity

            People

              zjwang Zhijiang
              chesnay Chesnay Schepler
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m