[FLINK-9367] Truncate() in BucketingSink is only allowed after hadoop2.7 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Won't Do
Affects Version/s: 1.5.0
Fix Version/s: None
Component/s: Connectors / Common
Labels:
None

Description

When output to HDFS using BucketingSink, truncate() is only allowed after hadoop2.7.
If some tasks failed, the ".valid-length" file is created for the lower version hadoop.

The problem is, if other people want to use the data in HDFS, they must know how to deal with the ".valid-length" file, otherwise, the data may be not exactly-once.

I think it's not convenient for other people to use the data. Why not just read the in-progress file and write a new file when restoring instead of writing a ".valid-length" file.

In this way, others who use the data in HDFS don't need to know how to deal with the ".valid-length" file.

Attachments

Issue Links

links to

GitHub Pull Request #6108

Activity

People

Assignee:: Unassigned

Reporter:: Xinyu Zhang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 15/May/18 11:21

Updated:: 19/Jun/18 12:03

Resolved:: 19/Jun/18 12:03