Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-667

Input Format for working with Content in Hadoop Streaming

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.0.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      All

      Description

      This is a ContextAsText input format that removes line endings with spaces that allow Nutch content to be used more effectively inside of Hadoop streaming jobs that allow MapReduce jobs to be written in any language that can communicate with stdin and stdout.

        Attachments

        1. NUTCH-667-1-20081126.patch
          3 kB
          Dennis Kubes

          Activity

            People

            • Assignee:
              musepwizard Dennis Kubes
              Reporter:
              musepwizard Dennis Kubes
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:
                Resolved: