Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-19630

Sink data in [ORC] format into Hive By using Legacy Table API caused unexpected Exception

    XMLWordPrintableJSON

Details

    Description

      ENV:

      Flink version 1.11.2

      Hive exec version: 2.0.1

      Hive file storing type :ORC

      SQL or Datastream: SQL API

      Kafka Connector :  custom Kafka connector which is based on Legacy API (TableSource/`org.apache.flink.types.Row`)

      Hive Connector : totally follows the Flink-Hive-connector (we only made some encapsulation upon it)

      Using StreamingFileCommitter:YES

       

       

      Description:

         try to execute the following SQL:

          """

            insert into hive_table (select * from kafka_table)

          """

         HIVE Table SQL seems like:

          """

      CREATE TABLE `hive_table`(
       // some fields
      PARTITIONED BY (
      `dt` string,
      `hour` string)
      STORED AS orc
      TBLPROPERTIES (
      'orc.compress'='SNAPPY',
      'type'='HIVE',
      'sink.partition-commit.trigger'='process-time',
      'sink.partition-commit.delay' = '1 h',
      'sink.partition-commit.policy.kind' = 'metastore,success-file',
      )   

         """

      When this job starts to process snapshot, here comes the weird exception:

      As we can see from the message:Owner thread shall be the [Legacy Source Thread], but actually the streamTaskThread which represents the whole first stage is found. 

      So I checked the Thread dump at once.

                                                                           The legacy Source Thread

       

                                                                     The StreamTask Thread

       

         According to the thread dump info and the Exception Message, I searched and read certain source code and then DID A TEST

       

         Since the Kafka connector is customed, I tried to make the KafkaSource a serpated stage by changing the TaskChainStrategy to Never. The task topology as follows:

       

      Fortunately, it did work! No Exception is throwed and Checkpoint could be snapshot successfully!

       

       

      So, from my perspective, there shall be something wrong when HiveWritingTask and  LegacySourceTask chained together. the Legacy source task is a seperated thread, which may be the cause of the exception mentioned above.

       

                                                                      

       

      Attachments

        1. image-2020-10-14-11-36-48-086.png
          626 kB
          Lsw_aka_laplace
        2. image-2020-10-14-11-41-53-379.png
          309 kB
          Lsw_aka_laplace
        3. image-2020-10-14-11-42-57-353.png
          291 kB
          Lsw_aka_laplace
        4. image-2020-10-14-11-48-51-310.png
          123 kB
          Lsw_aka_laplace

        Issue Links

          Activity

            People

              lirui Rui Li
              neighborhood Lsw_aka_laplace
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: