Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-25318

Improvement of scheduler and execution for Flink OLAP

    XMLWordPrintableJSON

Details

    Description

      We use flink to perform OLAP queries. We launch flink session cluster, submit batch jobs to the cluster as OLAP queries, and fetch the jobs' results. OLAP jobs are generally small queries which will finish at the seconds or milliseconds, and users always submit multiple jobs to the session cluster concurrently. We found the qps and latency of jobs will be greatly affected when there're tens jobs are running, even when there's little data in each query. We will give the result of benchmark for the latest version later.
      After discussed with xtsong, and thanks for his advice, we create this issue to trace and manager Flink OLAP related improvements. More users and developers are welcome and feel free to create Flink OLAP related subtasks here, thanks

      Attachments

        Issue Links

          1.
          Enable TCP connection reuse across multiple jobs. Sub-task Closed Yangze Guo
          2.
          Support listen and notify mechanism for PartitionRequest Sub-task Closed Yangze Guo
          3.
          System classloader memory leak after loading too many codegen classes. Sub-task Open Unassigned
          4.
          Improvement of reuse segments for join/agg/sort operators in TaskManager for flink olap queries Sub-task Open Unassigned
          5.
          Improvement of execution graph store in flink session cluster for jobs Sub-task Closed Fang Yong
          6.
          Manage and share gateways of taskmanagers between jobs in session cluster Sub-task Open Unassigned
          7.
          HiveSourceFileEnumerator should fetch splits asynchronously Sub-task Open Unassigned
          8.
          Improvement of connection from TM to JM in session cluster Sub-task Open Unassigned
          9.
          Add thread dump feature for jobmanager Sub-task Closed Zhanghao Chen
          10.
          Too many JM logs in flink session cluster for olap queries Sub-task Open Unassigned
          11.
          TaskExecutor always creates local file for task even when local state store is not used Sub-task Resolved Junfan Zhang
          12.
          ExecutionGraphInfoStore in session cluster should split failed and successful jobs Sub-task Open Zhanghao Chen
          13.
          Ignore buffer pools which have no floating buffer in buffer redistributing Sub-task Closed Yangze Guo
          14.
          Remove the redundant serialization of RPC invocation at Flink side. Sub-task Closed Yangze Guo
          15.
          Memory pages in LazyMemorySegmentPool should be clear after they are released to MemoryManager Sub-task Closed Fang Yong
          16.
          Optimize the time of fetching job status in the job submission of session cluster Sub-task Closed Yangze Guo
          17.
          Parallelized heavy serialization operations in StreamingJobGraphGenerator Sub-task Closed Yangze Guo
          18.
          Improve cache hit rate of generated class Sub-task Open Dan Zou
          19.
          Add WebSocket in Dispatcher to support olap query submission and push results in session cluster Sub-task Open Unassigned
          20.
          Improve to reuse threads in TaskManager for different tasks between jobs Sub-task Open Unassigned
          21.
          Support customized listener during task manager startup Sub-task Open Unassigned
          22.
          Use default classloader in jobmanager when there are no user jars for job Sub-task Closed Fang Yong
          23.
          job name should not always be `collect` submitted by sql client Sub-task Resolved xiangyu feng
          24.
          Using ZGC in JDK17 to solve long time class unloading STW Sub-task Closed Unassigned
          25.
          Support concurrency control when submitting OLAP jobs to Dispatcher Sub-task Open Unassigned
          26.
          SlotManager supports pulling up all TaskManagers at initialization Sub-task Closed Unassigned
          27.
          Add min number of slots configuration to limit total number of slots Sub-task Closed xiangyu feng
          28.
          Improve Flink UI's time precision from second level to millisecond level Sub-task Closed Jufang He
          29.
          Align the job execution result fetching timeout in CollectResultFetcher with akka timeout Sub-task Closed Yangze Guo
          30.
          Support plan cache for DQL in SQL Gateway Sub-task Resolved Dan Zou
          31.
          Upload python jar when sql contains python udf jar Sub-task Closed Yangze Guo
          32.
          Move the serialization of ShuffleDescriptorGroup out of the RPC main thread Sub-task Closed dizhou cao
          33.
          Generate the same code for the same logic Sub-task Resolved Dan Zou
          34.
          Use default classloader in TaskManager when there are no user jars for job Sub-task Open Dan Zou

          Activity

            People

              Unassigned Unassigned
              zjureel Fang Yong
              Votes:
              0 Vote for this issue
              Watchers:
              48 Start watching this issue

              Dates

                Created:
                Updated: