Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-16818

Optimize data skew when flink write data to hive dynamic partition table

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.10.0
    • Fix Version/s: None
    • Component/s: Connectors / Hive
    • Labels:
      None
    • Environment:
       

      Description

      I read the source table data of hive through flink sql, and then write the target table of hive. The target table is a partitioned table. When the data of a partition is particularly large, data skew occurs, resulting in a particularly long execution time.

      By default Configuration, the same sql, hive on spark takes five minutes, and flink takes about 40 minutes.

      example:

       

      // the schema of myparttable
      
      name string,
      age int,
      PARTITIONED BY ( 
      type string, 
      day string
      )
      
      INSERT OVERWRITE myparttable SELECT name, age, type,day from sourcetable;
      

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                zhangjun Jun Zhang
              • Votes:
                1 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Due:
                  Created:
                  Updated:
                  Resolved: