Details
-
New Feature
-
Status: Open
-
Not a Priority
-
Resolution: Unresolved
-
None
-
None
Description
currently if users have both an online and an offline job with same business logic in Flink SQL, their codebase is still not unified. They would keep two SQL statements whose only difference is the source (or/and sink) table (with different params). E.g.
// online job insert into x select * from kafka_table (starting time) ...; // offline backfill job insert into x select * from hive_table (starting and ending time) ...;
We can introduce a "dynamic catalog table". The dynamic catalog table acts as a view, and is just an abstract table of multiple actual tables behind it that can be switched under some configuration flags. When execute a job, depending on the configuration, the dynamic catalog table can point to an actual source table.
A use case for this is the example given above - when executed in streaming mode, my_source_dynamic_table should point to a kafka catalog table with a new starting position, and in batch mode, my_source_dynamic_table should point to a hive catalog table with starting/ending positions.
One thing to note is that the starting position of kafka_table, and starting/ending position of hive_table are different every time. needs more thinking of how can we accommodate that