Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-663

Expose Record-level File Path to Processing Functions

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.0.0
    • Core
    • None

    Description

      We have some processing pipelines where we want to know the file path that each record being processed came from.  It would be nice if this could be exposed to the DoFns in our pipelines.

       

      This same desire was expressed a little over 1 year ago on the mailing list:
      http://mail-archives.apache.org/mod_mbox/crunch-user/201611.mbox/%3CCAG-tO+Y42KRFiocg1RJT4qFcyvkPjFSfZa4z=wk34AriP4weTw@mail.gmail.com%3E

       

      Unfortunately, that thread dead-ended.

       

      I will use the comments section and a patch to propose a simple, albeit slightly hacky solution.  Another alternative would be to create a new Source that provides a PCollection<Pair<Path, Record>>, but I'm not sure of the effort it would take to create that.

      Attachments

        1. CRUNCH-663-v2.patch
          6 kB
          Ben Roling
        2. CRUNCH-663.patch
          3 kB
          Ben Roling

        Activity

          People

            jwills Josh Wills
            ben.roling Ben Roling
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: