[FLINK-6873] Limit the number of open writers in file system connector - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Not a Priority
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Connectors / Common, Connectors / FileSystem, Runtime / Task
Labels:
- auto-deprioritized-major
- auto-deprioritized-minor

Description

Mail list discuss:
https://mail.google.com/mail/u/1/#label/MailList%2Fflink-dev/15c869b2a5b20d43

Following exception will occur when Flink is writing to too many files:

java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at org.apache.hadoop.hdfs.DFSOutputStream.start(DFSOutputStream.java:2170)
at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1685)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1689)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1624)
at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:459)
at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
at org.apache.flink.streaming.connectors.fs.StreamWriterBase.open(StreamWriterBase.java:120)
at org.apache.flink.streaming.connectors.fs.StringWriter.open(StringWriter.java:62)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.openNewPartFile(BucketingSink.java:545)
at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.invoke(BucketingSink.java:440)
at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:41)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:528)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:503)
at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:483)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:891)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:869)
at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:103)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecord(AbstractFetcher.java:230)
at org.apache.flink.streaming.connectors.kafka.internals.SimpleConsumerThread.run(SimpleConsumerThread.java:379)

Letting developers decide the max open connections to the open files would be great.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Mu Kong

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 08/Jun/17 14:20

Updated:: 17/Dec/21 22:38