Mahout
  1. Mahout
  2. MAHOUT-249

Make WikipediaXmlSplitter able to write the chunks directly to HDFS or S3

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.2
    • Fix Version/s: 0.3
    • Component/s: Classification
    • Labels:
      None

      Description

      By using the Hadoop FS abstraction it should be possible to avoid writing the chunks on the local hard drive before uploading them to HDFS or S3.

      1. MAHOUT-249-2.patch
        7 kB
        Olivier Grisel
      2. MAHOUT-249-v2.patch
        5 kB
        Olivier Grisel
      3. MAHOUT-249-WikipediaXMLSplitterHDFS.patch
        5 kB
        Olivier Grisel

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Olivier Grisel
            Reporter:
            Olivier Grisel
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development