Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-16818

Optimize data skew when flink write data to hive dynamic partition table

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 1.10.0
    • None
    • Connectors / Hive
    • None
    •  

    Description

      I read the source table data of hive through flink sql, and then write the target table of hive. The target table is a partitioned table. When the data of a partition is particularly large, data skew occurs, resulting in a particularly long execution time.

      By default Configuration, the same sql, hive on spark takes five minutes, and flink takes about 40 minutes.

      example:

       

      // the schema of myparttable
      
      name string,
      age int,
      PARTITIONED BY ( 
      type string, 
      day string
      )
      
      INSERT OVERWRITE myparttable SELECT name, age, type,day from sourcetable;
      

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              zhangjun Jun Zhang
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: