Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-14567

Aggregate query with more than two group fields can't be write into HBase sink

    XMLWordPrintableJSON

Details

    Description

      If we have a hbase table sink with rowkey of varchar (also primary key) and a column of bigint, we want to write the result of the following query into the sink using upsert mode. However, it will fail when primary key check with the exception "UpsertStreamTableSink requires that Table has a full primary keys if it is updated."

      select concat(f0, '-', f1) as key, sum(f2)
      from T1
      group by f0, f1
      

      This happens in both blink planner and old planner. That is because if the query works in update mode, then there must be a primary key exist to be extracted and set to UpsertStreamTableSink#setKeyFields.

      That's why we want to derive primary key for concat in FLINK-14539, however, we found that the primary key is not preserved after concating. For example, if we have a primary key (f0, f1, f2) which are all varchar type, say we have two unique records ('a', 'b', 'c') and ('ab', '', 'c'), but the results of concat(f0, f1, f2) are the same, which means the concat result is not primary key anymore.

      So here comes the problem, how can we proper support HBase sink or such use case?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jark Jark Wu
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: