Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-67

Multiple writes in a pipeline are not performed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.4.0
    • 0.4.0
    • Core, Scrunch
    • None

    Description

      Consider the following simple PipelineApp (in Scala) that:
      1. Reads in a text source.
      2. Cleans the text of non-alphabetic characters.
      3. Writes the sanitized text to a text file.
      4. Computes word counts from the text.
      5. Writes the word counts to a text file.

      When this code is executed, the write from step 5 is performed successfully, but the write from step 3 is not.

      object ShakesMultiWrite extends PipelineApp {

      val shakes = read(From.textFile("shakes.txt"))

      // Now let's clean-up the text
      val cleanShakes = shakes.map

      {line => val cleanText = line.replaceAll( """[^A-Za-z\W]""", "").toLowerCase() cleanText }

      cleanShakes.write(To.textFile("shakesText/cleanShakes"))

      // Count words
      val wordCounts = cleanShakes.flatMap

      { line => line .split( """\W+""") // Split the text into words. .filter(w => !w.isEmpty()) // Get rid of any empty words created. }

      .count()

      wordCounts.write(To.textFile("shakesText/wordCounts"))

      // Runs the pipeline
      run()
      }

      Attachments

        1. ShakesMultiWrite.scala
          0.9 kB
          Kiyan Ahmadizadeh
        2. CRUNCH-67.patch
          13 kB
          Josh Wills

        Activity

          People

            jwills Josh Wills
            kiyan Kiyan Ahmadizadeh
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: