Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-249

Make WikipediaXmlSplitter able to write the chunks directly to HDFS or S3

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.2
    • Fix Version/s: 0.3
    • Component/s: Classification
    • Labels:
      None

      Description

      By using the Hadoop FS abstraction it should be possible to avoid writing the chunks on the local hard drive before uploading them to HDFS or S3.

        Attachments

        1. MAHOUT-249-2.patch
          7 kB
          Olivier Grisel
        2. MAHOUT-249-v2.patch
          5 kB
          Olivier Grisel
        3. MAHOUT-249-WikipediaXMLSplitterHDFS.patch
          5 kB
          Olivier Grisel

          Activity

            People

            • Assignee:
              ogrisel Olivier Grisel
              Reporter:
              ogrisel Olivier Grisel
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: