Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-19588

HBase zookeeper connection not released in hbase batch table source while flink job failover

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Abandoned
    • 1.11.0
    • None
    • Connectors / HBase
    • None

    Description

      Hi, I Create a sql job read from hbase table, the sql as below

      create table hbase_source_test(
       id bigint not null,
       f1 ROW<
       uid bigint,
       all_stay bigint>)
       with (
       'connector.type' = 'hbase',
       'connector.version' = '1.4.3',
       'connector.table-name' = 'test_out',
       'connector.zookeeper.quorum' = 'testcluster-dn1:2181,testcluster-dn2:2181,testcluster-dn3:2181'
       );
      create table test_mysql(
       id BIGINT,
       `name` VARCHAR,
       COST DOUBLE
       ) with (
       'connector.type' = 'jdbc',
       'connector.url' = 'jdbc:mysql://192.168.1.22:3306/test',
       'connector.table' = 'test_result',
       'connector.username' = 'test',
       'connector.write.flush.interval' = '2s'
       );
      create view view_1 as
       select
       if (f1.uid is null, 0,f1.uid) as uid,
       proctime() as itime from hbase_source_test;
      insert into `test_mysql` select uid, '', 0 from view_1;
      

      the field type(uid bigint, all_stay bigint) defined in `hbase_source_test` is not matched with the column in actual hbase table(uid int, all_stay int), run this sql job in yarn cluster, it will failover as data type not matched, with hbase table exception:

      Source: HBaseTableSource[schema=[id, f1], projectFields=[1]] (1/3) (b16b12602c2e7e442785b15c5d6509f9) switched from RUNNING to FAILED on org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@110fff83. java.lang.IllegalArgumentException: offset (0) + length (8) exceed the capacity of the array: 4
       at org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:779)
       ......................................
      

      when the job failover, it will reconnect to hbase zookeeper, as it show in the log

      2020-09-16 07:56:33,383 INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Opening socket connection to server hr-rec2/10.221.114.150:2181
       2020-09-16 07:56:33,383 INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Socket connection established to hr-rec2/10.221.114.150:2181, initiating session
       2020-09-16 07:56:33,385 INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Session establishment complete on server hr-rec2/10.221.114.150:2181, sessionid = 0x3737ad5b2ccd9fd, negotiated timeout = 60000

      while the job failover for many times, we find the hbase zookeeper connection count is alway increased, by use the command: `netstat -an | grep 2181 | wc -l`, at last the connections count will goes to a very big number(thousands of),which will exhaust the hbase zookeeper‘s connection.

      Attachments

        Activity

          People

            Unassigned Unassigned
            zouyunhe KevinyhZou
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: