Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-2751

Write PCollection elements to individual files

Details

    • New Feature
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • None
    • 2.2.0
    • sdk-java-core

    Description

      I'd like to write elements as individual files.

      Rather than smashing thousands of outputs into a handful of files as TextIO does (output-00000-of-00005, output-00001-of-00005,...), I want to write each element into unique files.

      So if I used WholeFileIO from BEAM-2750 to read in three files (hi.txt, what.txt, and yes.txt) then I'd like to write the processed files out to individual files with user or data-defined filenames (like hi-modified.txt, what-modified.txt, and yes-modified.txt).

      With a WholeFileIO, this would look like:

      PCollection<KV<String, Byte[]>> fileNamesAndBytes = p.apply("Read", WholeFileIO.read().from("/path/to/input/dir/*"));
      ...
      // Do stuff that change contents and file names
      PCollection<KV<String, Byte[]>> modifedFileNamesAndBytes = ...
      ...
      modifedFileNamesAndBytes.apply("Write", WholeFileIO.write().to("/path/to/output/dir/"));
      

      This ticket complements BEAM-2750.

      Attachments

        Activity

          People

            jkff Eugene Kirpichov
            christophhebert Christopher Hebert
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: