Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-10261

[FileIO] Unexpected exception thrown when retrieving a GCS file with a space inside path

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • P1
    • Resolution: Fixed
    • 2.20.0, 2.21.0, 2.22.0, 2.23.0, 2.24.0, 2.25.0
    • 2.26.0
    • io-java-gcp
    • Google Cloud Dataflow

    Description

      Hi,

      I am using a PTransform class to retrieve Google Cloud Storage files with FileIO that were working very well before version 2.20.0. 

      I have upgraded my Beam library last week, to 2.20.0 & 2.21.0 and now I have an unexpected Exception when I retrieve some files with space inside the path:

      Error message from worker: java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.io.FileNotFoundException: Item not found: 'gs://[MY_BUCKET]/2017/09/12/3d9d7cc8-e970-42f8-9f24-7d9b70989033/31/a9/ba/<1710RH600@optimashipbroking.com /body.txt'. If you enabled STRICT generation consistency, it is possible that the live version is still available but the intended generation is deleted. org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:184)
      

       

      Please note that the gcloud following gcloud command works:

      gsutil ls "gs://[MY_BUCKET]/2017/09/12/3d9d7cc8-e970-42f8-9f24-7d9b70989033/31/a9/ba/<1710RH600@optimashipbroking.com /body.txt"

       

      Here is my code:

      public PCollection<KV<String, byte[]>> expand(PBegin begin) {
          PCollection<KV<String, byte[]>> files = begin
      .apply(FileIO.match().filepattern("gs://[MY_BUCKET]/**/body.txt").withEmptyMatchTreatment(EmptyMatchTreatment.ALLOW))
              .apply(FileIO.readMatches())
              .apply("Extract key",
                  ParDo.of(
                      new DoFn<ReadableFile, KV<String, byte[]>>() {
                          @ProcessElement
                          public void processElement(ProcessContext c) throws IOException {
                              ReadableFile f = c.element();
                              c.output(KV.of(f.getMetadata().resourceId().toString(), f.readFullyAsBytes()));
                          }
                      }
                  )
              );
      
          return files;
      }
      

       

      Maybe I just need to find a way to escape the file path but I don't know how.

       

      I hope you can help me. 

       

      Xavier

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            xavier-shipfix Xavier HAUSHERR
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: