Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-25318 Improvement of scheduler and execution for Flink OLAP
  3. FLINK-25335

HiveSourceFileEnumerator should fetch splits asynchronously

    XMLWordPrintableJSON

Details

    Description

      When submit olap query by flink client to Flink Session Cluster, the JobMaster will start scheduling and  enumerate the hive source split by `HiveSourceFileEnumerator`, and then deploy the query task and execute it. if the source table has a lot of partition and the partition file is big, the source split enumerate will cost a lot of time, which would block the task deployment & execution for a long time, and the dashboard can not appear

      it would be better to Asynchronous enumerate the hive split, and meanwhile deploy the query task and execute it. when the deployment is finished, source operator fetch split and read data, and the split enumeration is also going on.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              zouyunhe KevinyhZou
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: