Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-10592

HDFS file writing has intermittent failures from 2.16 (and above)

Details

    • Bug
    • Status: Open
    • P3
    • Resolution: Unresolved
    • 2.16.0, 2.18.0, 2.19.0, 2.22.0
    • None
    • None
    • Azure Databricks

    Description

      Bumping from Beam 2.13.0 to 2.16.0 and above we see broken pipelines running on spark/HDFS.

      Platform: Azure databricks.

      Beam 2.13.0 works fine, Have issues only after migrating to 2.16 and above, and only on large jobs (smaller jobs run fine)

      Caused by: java.io.IOException: Unable to rename resource wasbs://****/output/npstand75k0727_1/np/.temp-beam-64a00562-5dcd-4bcd-9c5a-be7cff1231f3/483d5498-ed9c-46fd-b1ce-8647fa5c8a06 to wasbs://*****/output/npstand75k0727_1/np/confinements/part-00000-of-00001.txt. No further information provided by underlying filesystem.Caused by: java.io.IOException: Unable to rename resource wasbs://**/output/npstand75k0727_1/np/.temp-beam-64a00562-5dcd-4bcd-9c5a-be7cff1231f3/483d5498-ed9c-46fd-b1ce-8647fa5c8a06 to wasbs://***/output/npstand75k0727_1/np/confinements/part-00000-of-00001.txt. No further information provided by underlying filesystem. at org.apache.beam.sdk.io.hdfs.HadoopFileSystem.rename(HadoopFileSystem.java:287) at org.apache.beam.sdk.io.FileSystems.rename(FileSystems.java:327) at org.apache.beam.sdk.io.FileBasedSink$WriteOperation.moveToOutputFiles(FileBasedSink.java:755) at org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn.process(WriteFiles.java:850)

      Attachments

        Activity

          People

            Unassigned Unassigned
            bkotha Bharath
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: