[FLINK-16818] Optimize data skew when flink write data to hive dynamic partition table - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: 1.10.0
Fix Version/s: None
Component/s: Connectors / Hive
Labels:
None
Environment:

Description

I read the source table data of hive through flink sql, and then write the target table of hive. The target table is a partitioned table. When the data of a partition is particularly large, data skew occurs, resulting in a particularly long execution time.

By default Configuration, the same sql, hive on spark takes five minutes, and flink takes about 40 minutes.

example:

// the schema of myparttable

name string,
age int,
PARTITIONED BY ( 
type string, 
day string
)

INSERT OVERWRITE myparttable SELECT name, age, type,day from sourcetable;

Attachments

Issue Links

is duplicated by

FLINK-15006 Add option to close shuffle when dynamic partition inserting

Closed

is related to

FLINK-15006 Add option to close shuffle when dynamic partition inserting

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Jun Zhang

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Due:: 27/Mar/20

Created:: 27/Mar/20 02:17

Updated:: 19/May/20 09:03

Resolved:: 19/May/20 09:03