Uploaded image for project: 'Crunch'
  1. Crunch
  2. CRUNCH-67

Multiple writes in a pipeline are not performed

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.4.0
    • Fix Version/s: 0.4.0
    • Component/s: Core, Scrunch
    • Labels:
      None

      Description

      Consider the following simple PipelineApp (in Scala) that:
      1. Reads in a text source.
      2. Cleans the text of non-alphabetic characters.
      3. Writes the sanitized text to a text file.
      4. Computes word counts from the text.
      5. Writes the word counts to a text file.

      When this code is executed, the write from step 5 is performed successfully, but the write from step 3 is not.

      object ShakesMultiWrite extends PipelineApp {

      val shakes = read(From.textFile("shakes.txt"))

      // Now let's clean-up the text
      val cleanShakes = shakes.map

      {line => val cleanText = line.replaceAll( """[^A-Za-z\W]""", "").toLowerCase() cleanText }

      cleanShakes.write(To.textFile("shakesText/cleanShakes"))

      // Count words
      val wordCounts = cleanShakes.flatMap

      { line => line .split( """\W+""") // Split the text into words. .filter(w => !w.isEmpty()) // Get rid of any empty words created. }

      .count()

      wordCounts.write(To.textFile("shakesText/wordCounts"))

      // Runs the pipeline
      run()
      }

        Attachments

        1. ShakesMultiWrite.scala
          0.9 kB
          Kiyan Ahmadizadeh
        2. CRUNCH-67.patch
          13 kB
          Josh Wills

          Activity

            People

            • Assignee:
              jwills Josh Wills
              Reporter:
              kiyan Kiyan Ahmadizadeh
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: