Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-18592

StreamingFileSink fails due to truncating HDFS file failure

    XMLWordPrintableJSON

Details

    Description

      I meet the issue on flink-1.10.1. I use flink on YARN(3.0.0-cdh6.3.2) with StreamingFileSink. 

      code part like this:

      	public static <IN> StreamingFileSink<IN> build(String dir, BucketAssigner<IN, String> assigner, String prefix) {
      		return StreamingFileSink.forRowFormat(new Path(dir), new SimpleStringEncoder<IN>())
      			.withRollingPolicy(
      				DefaultRollingPolicy
      					.builder()
      					.withRolloverInterval(TimeUnit.HOURS.toMillis(2))
      					.withInactivityInterval(TimeUnit.MINUTES.toMillis(10))
      					.withMaxPartSize(1024L * 1024L * 1024L * 50) // Max 50GB
      					.build())
      			.withBucketAssigner(assigner)
      			.withOutputFileConfig(OutputFileConfig.builder().withPartPrefix(prefix).build())
      			.build();
      	}
      

      The error is 

      java.io.IOException:
      Problem while truncating file:
      hdfs:///business_log/hashtag/2020-06-25/.hashtag-122-37.inprogress.8e65f69c-b5ba-4466-a844-ccc0a5a93de2
      

      Due to this issue, it can not restart from the latest checkpoint and savepoint.

      Currently, my workaround is that we keep latest 3 checkpoint, and if it fails, I manually restart from penult checkpoint.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ALVINWJ JIAN WANG
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated: