Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-9367

Truncate() in BucketingSink is only allowed after hadoop2.7

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Do
    • 1.5.0
    • None
    • Connectors / Common
    • None

    Description

      When output to HDFS using BucketingSink, truncate() is only allowed after hadoop2.7.
      If some tasks failed, the ".valid-length" file is created for the lower version hadoop.

      The problem is, if other people want to use the data in HDFS, they must know how to deal with the ".valid-length" file, otherwise, the data may be not exactly-once.

      I think it's notĀ convenient for other people to use the data. Why not just read the in-progress file and write a new file when restoring instead of writing a ".valid-length" file.

      In this way, others who use the data in HDFS don't need to know how to deal with the ".valid-length" file.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              zhangxinyu Xinyu Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: