Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-15206

support dynamic catalog table for truly unified SQL job

    XMLWordPrintableJSON

Details

    Description

      currently if users have both an online and an offline job with same business logic in Flink SQL, their codebase is still not unified. They would keep two SQL statements whose only difference is the source (or/and sink) table (with different params). E.g.

      // online job
      insert into x select * from kafka_table (starting time) ...;
      
      // offline backfill job
      insert into x select * from hive_table  (starting and ending time) ...;
      

      We can introduce a "dynamic catalog table". The dynamic catalog table acts as a view, and is just an abstract table of multiple actual tables behind it that can be switched under some configuration flags. When execute a job, depending on the configuration, the dynamic catalog table can point to an actual source table.

      A use case for this is the example given above - when executed in streaming mode, my_source_dynamic_table should point to a kafka catalog table with a new starting position, and in batch mode, my_source_dynamic_table should point to a hive catalog table with starting/ending positions.

      One thing to note is that the starting position of kafka_table, and starting/ending position of hive_table are different every time. needs more thinking of how can we accommodate that

      Attachments

        Activity

          People

            Unassigned Unassigned
            phoenixjiangnan Bowen Li
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: