Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Won't Do
-
1.5.0
-
None
-
None
Description
When output to HDFS using BucketingSink, truncate() is only allowed after hadoop2.7.
If some tasks failed, the ".valid-length" file is created for the lower version hadoop.
The problem is, if other people want to use the data in HDFS, they must know how to deal with the ".valid-length" file, otherwise, the data may be not exactly-once.
I think it's notĀ convenient for other people to use the data. Why not just read the in-progress file and write a new file when restoring instead of writing a ".valid-length" file.
In this way, others who use the data in HDFS don't need to know how to deal with the ".valid-length" file.
Attachments
Issue Links
- links to