Uploaded image for project: 'Crunch'
  1. Crunch
  2. CRUNCH-543

AvroPathPerKeyTarget copy nested subdirectories

    Details

    • Type: Improvement
    • Status: Reopened
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 0.14.0
    • Component/s: IO
    • Labels:
      None

      Description

      When using AvroPathPerKeyTarget to write out a subpath in the output directory using a String key, the key might indicate multiple subfolders:

      Pair<String, String> kv = new Pair<String, String>("foo/bar", "value");
      PTable<String, String> kvs = pipeline.create(Arrays.asList(kv),Avros.tableOf(Avros.strings(), Avros.strings()));
      PTables.asPTable(kvs).write(new AvroPathPerKeyTarget("output"));

      This throws the error:
      java.io.IOException: java.lang.IllegalArgumentException: Reducer output name 'bar' cannot be parsed
      at org.apache.crunch.impl.mr.exec.CrunchJobHooks$CompletionHook.handleMultiPaths(CrunchJobHooks.java:92)
      ...

      In AvroPathPerKeyTarget the handleOutputs method would need to recursively copy subfolders (currently only checks first level in output directory) to enable keys that define multiple sub folders.

        Attachments

        1. CRUNCH-543c.patch
          3 kB
          Adric Eckstein
        2. CRUNCH-543b.patch
          2 kB
          Adric Eckstein
        3. CRUNCH-543.patch
          5 kB
          Josh Wills

          Issue Links

            Activity

              People

              • Assignee:
                jwills Josh Wills
                Reporter:
                aeckstein Adric Eckstein
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: