Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-249

Make WikipediaXmlSplitter able to write the chunks directly to HDFS or S3

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.2
    • 0.3
    • None
    • None

    Description

      By using the Hadoop FS abstraction it should be possible to avoid writing the chunks on the local hard drive before uploading them to HDFS or S3.

      Attachments

        1. MAHOUT-249-2.patch
          7 kB
          Olivier Grisel
        2. MAHOUT-249-v2.patch
          5 kB
          Olivier Grisel
        3. MAHOUT-249-WikipediaXMLSplitterHDFS.patch
          5 kB
          Olivier Grisel

        Activity

          People

            ogrisel Olivier Grisel
            ogrisel Olivier Grisel
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: