Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-36429

Enhancing Flink History Server File Storage and Retrieval with RocksDB

    XMLWordPrintableJSON

Details

    Description

      Currently, when a Flink job finishes, it writes an archive as a single file that maps paths to JSON files. Flink History Server (FHS) job archives are pulled locally where the FHS is running on, and this process creates a local directory that expands based on the contents of the single archive file.

      Because of how the FHS stores the files, there are a large number of directories created in the local file system. This system can become inefficient and slow as the volume of job archives increases, creating bottlenecks in job data navigation and retrieval.

      To illustrate the problem of inode usage, let’s consider a scenario where there are 5000 subtasks. Each subtask creates its own directory, and within each subtask directory, there are additional directories that might store only a single file. This structure rapidly increases the number of inodes consumed.

      Integrating RocksDB, a high-performance embedded database for key-value data, aims to resolve these issues by offering faster data access and better scalability. This integration is expected to significantly enhance the operational efficiency of FHS by allowing faster data retrieval and enabling a larger cache on local Kubernetes deployments, thus overcoming inode limitations

      Attachments

        Activity

          People

            Unassigned Unassigned
            shawnsun Xiaowen Sun
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 2,016h
                2,016h
                Remaining:
                Remaining Estimate - 2,016h
                2,016h
                Logged:
                Time Spent - Not Specified
                Not Specified