Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-9831

Too many open files for RocksDB

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.5.0
    • Fix Version/s: None
    • Labels:
      None

      Description

      While running only one Flink job, which is backed by RocksDB with checkpoining to HDFS we encounter an exception that TM cannot access the SST file because the process has too many open files. However, we have already increased the file soft/hard limit on the machine.

      Number open files for TM on the machine:

       

      lsof -p 23301|wc -l
      8241

       

      Instance limits

       

      ulimit -a
      core file size (blocks, -c) 0
      data seg size (kbytes, -d) unlimited
      scheduling priority (-e) 0
      file size (blocks, -f) unlimited
      pending signals (-i) 256726
      max locked memory (kbytes, -l) 64
      max memory size (kbytes, -m) unlimited
      open files (-n) 1048576
      pipe size (512 bytes, -p) 8
      POSIX message queues (bytes, -q) 819200
      real-time priority (-r) 0
      stack size (kbytes, -s) 8192
      cpu time (seconds, -t) unlimited
      max user processes (-u) 128000
      virtual memory (kbytes, -v) unlimited
      file locks (-x) unlimited
       
      

       

      flink_open_files.txt
      java.lang.Exception: Exception while creating StreamOperatorStateContext.
      at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:191)
      at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:227)
      at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:730)
      at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:295)
      at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
      at java.lang.Thread.run(Thread.java:748)
      Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend for KeyedCoProcessOperator_98a16ed3228ec4a08acd8d78420516a1_(1/1) from any of the 1 provided restore options.
      at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:137)
      at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:276)
      at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:132)
      ... 5 more
      Caused by: java.io.FileNotFoundException: /tmp/flink-io-3da06c9e-f619-44c9-b95f-54ee9b1a084a/job_b3ecbdc0eb2dc2dfbf5532ec1fcef9da_op_KeyedCoProcessOperator_98a16ed3228ec4a08acd8d78420516a1__1_1__uuid_c4b82a7e-8a04-4704-9e0b-393c3243cef2/3701639a-bacd-4861-99d8-5f3d112e88d6/000016.sst (Too many open files)
      at java.io.FileOutputStream.open0(Native Method)
      at java.io.FileOutputStream.open(FileOutputStream.java:270)
      at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
      at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
      at org.apache.flink.core.fs.local.LocalDataOutputStream.<init>(LocalDataOutputStream.java:47)
      at org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:275)
      at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.create(SafetyNetWrapperFileSystem.java:121)
      at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.copyStateDataHandleData(RocksDBKeyedStateBackend.java:1008)
      at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.transferAllDataFromStateHandles(RocksDBKeyedStateBackend.java:988)
      at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.transferAllStateDataToDirectory(RocksDBKeyedStateBackend.java:973)
      at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.restoreInstance(RocksDBKeyedStateBackend.java:758)
      at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.restore(RocksDBKeyedStateBackend.java:732)
      at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.restore(RocksDBKeyedStateBackend.java:443)
      at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.restore(RocksDBKeyedStateBackend.java:149)
      at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151)
      at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123)
      ... 7 more

        Attachments

        1. flink_open_files.txt
          707 kB
          Sayat Satybaldiyev

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              sayatez Sayat Satybaldiyev
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: